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SUBTILIS1N VARIANTS CAPABLE OF CLEAVING SUBSTRATES 
CONTAINING BASIC RESIDUES 


FIELD OF THE INVENTION 

Tli is invention relates to subtilisin variants having altered specificity from wild-type subtilisins. 
5 Specifically, the subtilisin variants are modified so that they efficiently and selectively cleave substrates 
containing basic residues. The invention further relates to the DNA encoding these novel polypeptides, as well 
as the recombinant materials and methods for producing these subtilisin variants. In a particular aspect, the 
present invention provides for processes for cleaving protein substrates containing basic residues. 

BACKGROUND OF THE INVENTION 

1 o Site-specific proteolysis is one of the most common forms of post-translational modifications of proteins 

(for review see Neurath, H. (1989) Trends Biochem. ScL, 14:268). In addition, proteolysis of fusion proteins in 
vitro is an important research and commercial tool (for reviews see Uhlen, M. and Moks, T. (1990) Methods 
EnzymoL, 185:129-143; Carter, P. (1990) in Protein Purification: From Molecular Mechanisms to Large-Scale 
Processes, M.R. Landisch, R.C. Wilson, CD. Painton, S.E Builder, Eds. (ACS Symposium Series 427, 
15 American Chemical Society, Washington, D.C.), Chap. 13, p.I8M93; and Nilsson, B. et al. (1992) Current 
Opin. Struct Biol., 2:569). Expressing a protein of interest as a fusion protein facilitates purification when the 
fusion contains an affinity domain such as glutathione-S -transferase, Protein A or a poly-histidine tail. The 
fusion domain can also facilitate high level expression and/or secretion. 

To liberate the protein product from the fusion domain requires selective and efficient cleavage of the 

2 0 fusion protein. Both chemical and enzymatic methods have been imposed (see references above). Enzymatic 

methods are generally preferred as they tend to be more specific and can be performed under mild conditions 
that avoid dermturation or unwanted chemical side-reactions. A number of natural and even designed enzymes 
have been applied for she-specific proteolysis. Although some are generally more useful than others (Forsberg, 
G„ Baastmp, B., Rondahl, H., Holmgren. E.. Pohl, G., Hartmanis, M. and Lake, M (1992) J. Prot. Chem., 
25 1 1:201-21 1), no one is applicable to every situation given the sequence requirements of the fusion protein 
junction and the possible existence of protease sequences within the desired protein product. Thus, an expanded 
array of sequence specific proteases, analogous to restriction endonucleases, would make site-specific proteolysis 
a more widely used method for processing fusion proteins or generating protein/peptide fragments either in vitro 
or in vivo. 

30 The processing of prohormones by the KEX2 -related family of serine endoproteases illustrates one of 

the most precise proteolytic events found in nature (for reviews see Steiner, D. F.,Smeekens, S. P., Ohagi. S. and 
Chan, S. J. (1992)J. Biol. Chem., 267, 23435-23438 and Smeekens, S. P. (1993) Bio/Technology 11.1 82-1 86). 
This family of proteases, that includes the yeast KEX2 and the mammalian PC2, PC3 and furin enzymes, are 
homologous to the bacterial serine protease subtilisin (Kraut, J. (1977) Annu. Rev. Biochem.., 46:33 1-358). 

35 Subtilisin has a broad substrate specificity that reflects its role as a scavenger protease. In contrast, these 
eukaryotic enzymes are very specific for cleaving substrates containing two basic residues and thus well-suited 
for site-specific proteolysis. 
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All of these eucaryotic enzymes strongly require Arg at the PI position, and either Arg, Lys or Pro at 
the P2 position of peptide substrates. The prohormone convertases from higher eukaryotes such as furin, PC2, 
and PC3 also have an absolute requirement for Arg at the P4 position (Bresnahan, P. A., Leduc, R., Thomas. L M 
Thorner, J., Gibson, H. L., Brake, A. J., Barr, P. J. and Tliomas, G. (1990) J. Cell. Biol. 1 1 1 , 2851 ; Wise, R. J., 
5 Baar, P. J., Wong, P. A., Kiefer, M. C, Brake. A. J., and Kaufrnan, R. J. (1990) Proc. Natl. Acad. Sci. USA 87. 
9378-9382.; Hosaka, M„ Nagahama, M., Kim, W.-S„ Watanabe, T., Hatsuzakawa, K„ Ikemizu, J., Murakami, 
K., andNakayama, K. (199iy. Biol. Chem. 266, 121 27-12 130.;Matthews, D. J„ Goodman, L. J., Gorman. C. 
M., and Wells, J. A. (1994) Protein Science 3, 1 197-1205). 

Despite the very narrow specificity of the pro-hormone processing enzymes, in some cases they are 
10 capable of rapid cleavage of target sequences. For example, the k^/Km ratio for KEX2 to cleave a good 
substrate (eg. acetyl-pMYRK-MCA) is l.lxlO 7 (Brenner, C, and Fuller, R.S: (1992) Proc. Natl. Acad. 
Sci. USA , 89:922-926) compared to 3x10 s for subtilisin cleaving a good substrate (eg. suc-AAPF-pNA) (Estell, 
D. A., Graycar, T. P., Miller, J. V., Powers, D. B., Burnier, J. P., Ng, P. G. and Wells, J.A. (1986) Science, 
233:659-663). 

15 However, the eukaryotic proteases are expressed in small amounts (Bravo, D. B., Gleason, J. B.. 

Sanchez, R. I.. Rom, R. A., and Fuller, R. S. (1994) J. Biol. Chem.. 269:25830-25837 and Matthews, D. J., 
Goodman, L. J., Gorman, C. M., and Wells, J. A. (1994) Protein Science , 3:1197-1205) making them 
impractical to apply presently to processing of fusion proteins in vitro. Subtilisin BPN' however, can be 
expressed in large amounts (Wells, J.A., Ferrari, E., Henner, DJ., Estell, D.A. and Chen, E.Y. (1983) Nuci 

io Acids Res., 11:7911-7929) 

Extensive protein engineering studies of subtilisin, and especially subtilisin BPN', have identified 
several residues in the SI and S2 active site of the enzyme where amino acid substitutions lead to large changes 
in substrate specificity (Wells, J. A„ and Estell, D.A., (1988) Trends Biochem. Sci., 13:291-297; Carter, P., et 
al., (1989) PROTEINS:Structure, Function, and Genetics, 6:240-248). X-ray crystal structures of subtilisin 

2 5 containing bound transition state analogues (Wright, C. S., Alden, R. A. and Kraut, J. ( 1 969) Nature , 22 1 :23S- 
242; McPhalen, C.A. and James, N.G. (1988) Biochemistry, 27:6582-6598; Bode, W., Papamokos, E„ Musil, 
D., Seemueller, U. and Fritz, M. (1986) EMBO J., 5:813-818; and Bott, R., Uhsch, M., ICossiakofT, A., Graycar, 
T, Katz, B. and Power, S. (1988) J. BioL Chem., 263:7895-7906) can be used to locate active site residues that 
are in close proximity to side chains at key positions in substrate peptides (Wells, J.A., ( 1987) Proc. Natl. Acad. 

30 Sci. USA 84:1219-1223). Consideration of electrostatic interactions between charged peptide substrates and 
subtilisin can be used to tailor the substrate binding cleft of the subtilisin BPN 1 to favor complementary charged 
substrates (Wells, J.A., et al, (1987) Proc. Natl. Acad. Sci., USA, 84:1219-1223). Previous work has shown that 
replacement of residues at position 156 and 166 hi the SI binding site of subtilisin BPN' with various charged 
residues leads to improved specificity for complementary charged substrates. 

35 a substantial amount of protein engineering has been applied to the specificity determinants of the S4 

subshe of subtilisin BPN 1 in efforts to aher specificity for P4 substrates (Eder, J., Rheinnecker, M., and Fersht, 
A. R. (1993) FEBS Lett 335, 349-352; Rheinnecker, M., Baker, G., Eder, J., and Fersht, A. R. (1993) 
Biochemistry 32, 1 199-1203; Rheinnecker, M., Eder, J.,Pandey, P.S., and Fersht, A. R. (1994) Biochemistry 33, 
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221-225). However, the mutations introduced consisted entirely of hydrophobic substitutions, thus preserving 
the overall hydrophobic substrate preference in the site. 

Previous attempts to introduce, remove or reverse charge specificity in enzyme active sites have been 
met with considerable difficulty. Hiis has generally been attributed to a lack of stabilization of the introduced 
5 charge or enzyme-substrate ion pair complex by the wild-type enzyme environment (Hwang, J.K. and Warshel, 
A. (1988) Nature , 334:270-272). For example, Stennicke et. a! (Stennicke, H.R.; Ujje, H.M.j Christensen, U.; 
Remington, S.J.; and Breddam (1994) Prot. Eng. 7:91 1-916) made acidic (D/E) mutations at five residues in 
the PI' binding of carboxypeptidase Yin an attempt to change the PI' preference from Phe to Lys/Arg. Only the 
L272D and L272E mutations were found to alter the specificity in the desired direction, up to 1 .3-fold preference 

10 in Lys/Arg over Phe, and the others simply resulted in less active enzymes having substrate preferences similar 
to wild-type. In the case of trypsin, a protease that is highly specific for basic PI residues, recruitment of 
chymotrypsin-like (hydrophobic PI ) specificity required not only mutations of the ion pair-forming Asp 1 89 to 
Ser, but also transplantation of two more distant surface loops from chymotrypsin (Graf, Jancso, A., Szilagyi, 
L., Hegyu G- Pinter, K., Naray-Szabo, G., Hepp, J., Medzihradszky, K., and Rutter, W. J., Proc. Natl. Acad. 

15 Sci. USA (1988) 85:4961-4965 and Hedstrom, L., Szilagyi, L., and Rutter, W. J., Science (1992) 255:1249- 
1253). 

In the present work, we have also verified that relatively low specificity is gained by introducing single 
ion-pairs between enzyme and substrate. However, when two or more choke ionic interactions were 
simultaneously engineered into subtilisin BPN\ the resulting variants had higher specificity for basic residues 
20 in each of the subsites due to a non additive effect. 

Accordingly, it is an object to produce a subtilisin variant with basic specificity for use in processing 
pro-proteins made by recombinant techniques. 

SUMMARY OF THE INVENTION 

The present invention provides for subtilisin variants with altered substrate specificity. Preferred 
25 subtilisin variants are highly specific for the efficient cleavage of substrates containing basic residues. The 
subtilisin variants have a substrate specificity which is substantially different from the substrate specificity of the 
precursor subtilisin from which the amino acid sequence of the variant is derived. The amino acid sequence of 
the subtilisin variants are derived by the substitution of one or more amino acids of a precursor subtilisin amino 
acid sequence. 

30 in a preferred aspect of the present invention, the subtilisin variants of the present invention are specific 

for the cleavage of protein substrates containing basic amino acid residues at substrate positions PI, P2 and P4. 
According to this aspect of the present invention subtilisin variants having amino acid substitutions at positions 
corresponding to amino acid positions 62, 104 and 166 of subtilisin BPN' produced by Bacillus 
amylatiquefaciens are preferred. Accordingly, subtilisin variants are provided wherein amino acids 62, 104 and 

35 166 of subtilisin BPN* are substituted with an acidic amino acids. Preferably the acidic amino acid is Asp or Glu. 
and most preferably Asp. 
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Preferred substrates for the subtilisin variants according to this aspect of the present invention contain 
either Lys (K) or Arg (R) at substrate positions P2 and PI, practically any residue at P3, and preferably cither 
Lys or Arg at P4, and again practically any residue at P5. Thus an exemplary good substrate would contain -Asn- 
Arg-Met-Arg-Lys- (SEQ ID NO: 76) at -P5-P4-P3-P2-P1- respectively. Additionally, good substrates would 
5 not have Pro at Pl\ P2\ or P3' nor would He be present at PI'. 

According to a second aspect of the present invention die subtilisin variants are capable of cleaving 
protein substrates having basic residues at positions PI and P2. According to this aspect of the present invention 
subtilisin variants having amino acid substitutions at positions corresponding to amino acid positions 62, and 166 
of subtilisin BPW produced by Bacillus amyloliquefaciens are preferred. The preferred subtilisin variants having 

10 substrate specificity for dibasic substrates have an acidic amino acid residue at residue position 62 of subtilisin 
naturally produced by Bacillus amyloliquefaciens. In a preferred embodiment, the naturally occurring Asn at 
residue position 62 of subtilisin BPN' is preferably substituted with an acidic amino acid residue such as Glu or 
Asp, and most preferably Asp. The preferred subtilisin variants, having substrate specificity for substrates having 
dibasic amino acid residues, additionally have an acidic residue, Asp or Glu, at residue position 166 of subtilisin 

15 BPN'. Thus, the subtilisin BPN' variant containing substitution of amino acids 62 and 166 with acidic amino 
acids Glu or Asp are preferred In particular, a subtilisin variant having amino acid Asp at positions 62 and 166 
is preferred (subtilisin BPN' variant N62D/G166D). The subtilisin variants according to this aspect of the 
invention may be used to cleave substrates containing dibasic residues such as fusion proteins with dibasic 
substrate linkers and processing hormones or other proteins (in vitro or in vivo) that contain dibasic cleavage 

20 sites. 

Preferred substrates for the subtilisin BPN' variant N62D/G1 66D contain either Lys (K) or Arg (R) at 
substrate positions P2 and PI , practically any residue at P3, a non-charged hydrophobic residue at P4. and again 
practically any residue at PS. Thus an exemplary good substrate would contain -Asn-Leu-Met-Arg-Lys-{SEQ 
ID NO: 35) at -P5-P4-P3-P2-P1- respectively. Additionally, good substrates would not have Pro at PI', P2\ or 

2 5 P3' nor would lie be present at PI \ 

The invention also includes mutant DNA sequences encoding such subtilisin variants. These mutant 
DN A sequences are derived from a precursor DNA sequence which encodes a natural ly occurring or recombinant 
precursor subtilisin. The mutant DNA sequence is derived by modifying the precursor DNA sequence to encode 
the substitution(s) of one or more amino acids encoded by the precursor DNA sequence. These recombinant 

3 o DNA sequences encode mutants having an amino acid sequence which does not exist in nature and a substrate 

specificity which is substantially different from the substrate specificity of the precursor subtilisin encoded by 
the precursor DNA sequence. 

Further the invention includes expression vectors containing such mutant DNA sequences as well as host 
cells transformed with such vectors which are capable of expressing the subtilisin variants. 
3 5 The invention also provides for a process for cleaving a polypeptide such as a fusion protein containing 

a substrate linker represented by the formula: 
P4-P3-P2-P1 

wherein P4 is a basic amino acid or a large hydrophobic amino acid such as Leu or Met; P3 is an amino acid 
selected from the naturally occurring amino acids; P2 is a basic amino acid; and P 1 is a basic amino acid. The 
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process includes the step of subjecting the polypeptide to the subtilisin variants described herein under conditions 
such that the subtilisin variant cleaves the polypeptide. 


BRIEF DESCRIPTION OF THE FIGURES 
Figure 1. Structure of a succinyl-Ala-Ala-Pro-BoroPhe (SEQ ID NO: 69) inhibitor bound to the active 
5 site of subtilisin BPN' showing the S2 and SI binding pocket residues subjected to mutagenesis. 

Figure 2. Kinetic analysis of SI binding site subtilisin mutants versus substrates having variable PI 
residues. The kinetic constant k^/Km was determined from plots of initial rates versus substrate concentration 
for the tetrapeptide series succinyl-Ala-Ala-Pro-Xaa-pNa (SEQ ID NO: 69), were Xaa was Lys (SEQ ID NO: 
58), Arg (SEQ ID NO: 59), Phe (SEQ ID NO: 56), Met (SEQ ID NO: 60) or Gin (SEQ ID NO: 61 ) (defined to 
10 the right of the plot). 

Figure 3. Kinetic analysis of S2 binding site subtilisin mutants versus substrates having variable P2 
residues. The kinetic constant k^/Km was determined from plots of initial rates versus substrate concentration 
for the tetrapeptide series succinyl-Ala-Ala-Xaa-Phc-pNa (SEQ ID NO: 70), were Xaa was Lys(SEQ ID NO: 
62), Arg (SEQ ID NO: 64), Ala (SEQ ID NO: 63X Pro (SEQ ID NO: 56), or Asp (SEQ ID NO: 65) (defined on 
15 the right of the plot). 

Figure 4. Kinetic analysis of combined SI and S2 binding site subtilisin mutants versus substrates having 
variable PI and P2 residues. The kinetic constants k etJ /Km were determined from plots of initial rates versus 
substrate concentration for the tetrapeptide series succiny 1-Ala-Ala-Xaa 2 -Xaa, -pNa (SEQ ID NO: 7 1 ), were 
Xaaj-Xaa, was Lys-Lys (SEQ ID NO: 66X Lys-Arg (SEQ ID NO: 67), Lys-Phe(SEQ ID NO: 62), Pro-Lys (SEQ 
20 ID NO: 58), Pro-Phe (SEQ ID NO: 56), or Ala-Phe (SEQ ID NO: 63) (defined on the right of the plot). 

Figure 5. Results ofhGH-AP fusion protein assay. hGH-AP fusion proteins were constructed, bound to 
hGHbp-coupled resin, and treated with 03 nM N62D/G166D subtilisin in 20 mM Tris-Cl pH 8.2. Aliquots were 
withdrawn at various times and AP release was monitored by activity assay m comparison to a standard curve. 
Arrows indicate the cleavage she. The rate of cleavage of fusion proteins containing various substate linkers is 
25 shown. Substrates containing a Pro at position PI* are not cleaved. 

Figure 6-1 - 6-10. (Collectively referred to herein as Fig. 6). DNA sequence of the phagemid pSS5 
containing the N62D/G166D double mutant subtilisin BPN* gene (SEQ ID NO: 1), and translated amino acid 
sequence for the mutant prcprosubtilisin (SEQ ID NO: 2). The pre region is comprised of residues - 1 07 to -78, 
the pro of residues -77 to -1, and the mature enzyme of residues +1 to +275 (SEQ ID NO: 72). Also shown are 
3 0 restriction sites recognized by endonucleases that require 6 or more specific bases in succession. 

Figure 7. Structure of a succmyl-Ala-Ala-Pro-BoroPhe (SEQ ID NO: 69) inhibitor bound to the active 
site of subtilisin BPTST showing the 51, S2, and S4 binding pocket residues subjected to mutagenesis. 

Figure 8. DNA sequence of the N62D/Y104D/G166D triple mutant (SEQ ID NO:74) as well as the 
translated amino acid sequence (SEQ ID NO:75). The preregion is comprised of residues -107 to -78, the pro 
3 5 residues -77 to - 1 and the mature enzyme + 1 to +275. The proregion reflects the changes, A(-4)R/A(-2)K/Y(- 1 )R 
made in the wild-type processing site to affect expression. 


-5- 


WO 96/27671 PCT/US96A02861 
DETAILED DESCRIPTION OF THE INVENTION 

Definitions 

Terms used in the claims and specification are defined as set forth below unless otherwise specified. 
The term ammo acid or amino acid residue, as used herein, refers to naturally occurring L amino acids or 
5 residues, unless otherwise specifically indicated. The commonly used one- and three-letter abbreviations for 
amino acids are use herein (Lehninger, A. L., Biochemistry, 2d ed., pp. 7 1 -92, Worth Publishers, N. Y. ( 1975)). 
Basic amino acids are Arg and Lys. Acidic amino acids are Asp and Glu. 

Substrates are described in triplet or single letter code as Pn...P2-Pl-Pl*-P2\..Pn\ The M P1 " residue refers 
to the position proceeding (i.e., N-terminal to) the scissile peptide bond (i.e. between the PI and PI' residues) 
10 of the substrate as defined by Schechter and Berger (Schechter, I. and Berger, A., Biochem. Biophys. Res. 
Commun. 27: 157-162 (1967)). Similarly, the term PI' is used to refer to the position following (i.e., C-terminal 
to) the scissile peptide bond of the substrate. Increasing numbers refer to the next consecutive position preceding 
(e.g., P2 and P3) and following (e.g., P2' and P3 1 ) the scissile bond. According to the present invention the 
scissile peptide bond is that bond that is cleaved by the subtilisin variants of the instant invention. 
15 "Subtilisins," "precursor subtilisin" and the like are bacterial carbonyl hydrolases which generally act to 

cleave peptide bonds of proteins or peptides. As used herein, "subtilisin 11 means a naturally occurring subtilisin 
or a recombinant subtilisin. A series of naturally occurring subtilisins are known to be produced and often 
secreted by various bacterial species (Siezen, RJ„ et al., (1991) Protein Engineering 4:719-737). Amino acid 
sequences of the members of this series are not entirely homologous. However, the subtilisins in this series 
2 0 exhibit the same or similar type of proteolytic activity. This class of serine proteases shares a common amino 
acid sequence defining a catalytic triad which distinguishes them from the chymotrypsin related class of serine 
proteases. The subtilisins and chymotrypsin related serine proteases both have a catalytic triad comprising 
aspartate, histidinc and serine. In the subtilisin related proteases the relative order of these amino acids, reading 
from the amino to carboxy terminus is aspartate-histidine-seiine. In the chymotrypsin related proteases the 
25 relative order, however is b istidine-aspartate- serine . Thus, subtilisins as used herein refer to a serine protease 
having the catalytic triad of subtilisin related proteases. 

Generally, subtilisins are serine endoproteases' having molecular weights of about 27,500 which are 
secreted in large amounts from a wide variety of Bacillus species. The protein sequence of subtilisins have been 
determined from at least four different species of Bacillus (Markland, F.S., et al (1971) in The Enzymes, ed. 
30 Boyer P.D., Acad Press, New York, Vol. Ill, pp. 56 1 -608; and Nedkov, P. et al. (1 983) Hoppe-Seyler's Z. 
Physiol. Chem. 364:1537-1540). The three-dimensional crystallography structure of four subtilisins have been 
reported (BPW from Bacillus amyloliquefaciens, Hirono et al. (1984) J. Mol. Biol. 178:389-413; subtilisn 
Carlesberg from Bacillus licheniformis. Bode et al., (1986) EMBO J., 5:813-818; therm itase from 
Thermoactinomyces vulgaris, Gros etal.,(l989) J. Mol. Biol. 210:347-367; and proteinase K from Tritirachium 
35 album, Betzel, et al., (1988) Acta Crystallogr., B, 44:163-172). The three dimensional structure of subtilisin 
BPN' (from B. amyloliquefaciens) to 2.5A resolution has also been reported by Wright, C.S. et al ( 1 969) Nature 
221:235-242 and Drenth, J. eta!. (1972) Eur. 1 Biochem. 26:177-181. These studies indicate that although 
subtilisin is genetically unrelated to the mammalian serine proteases, it has a similar fold and active site structure. 
The x-ray crystal structures of subtilisin containing covalently bound peptide inhibitors (Robertus. J.D.. et al. 
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(1972) Biochemistry 1 1 2439-2449), product complexes (Robertus, J.D., et al. (1972) Biochemistry 1 1 :4293- 
4303), and transition state analogs (Matthews, D.A., et al. (\975)J. Biol. Chem. 250:7120-7126 and Poulos. 
T.L.,e/a/. (1976) J. Biol. Chem. 251:1097-1103), which have been reported have also provided information 
regarding the active site and putative substrate binding cleft of subtilisins. In addition, a large number of kinetic 
5 and chemical modification studies have been reported for subtilisins (Phillip, M. t et a!. (1983) Moi Cell. 
Biochem. 51:5-32; Svendsen, LB. (1976) Carlsberg Res. Comm. 41:237-291 and Markland, F.S. Id.) as well 
as at least one report wherein the side chain of methione at residue 222 of subtilisin was converted by hydrogen 
peroxide to methionine-sulfoxide (StaufTer, D.C. ; etal. (1965) ./ Biol. Chem. 244 5333-5338). 

"Subtilisin variant/ "subtilisin mutant" and the like refer to a subtilisin-type serine protease having a 

10 sequence which is not found in nature that is derived from a precursor subtilisin according to the present 
invention. The subtilisin variant has a substrate specificity different from the precursor subtilisin by virtue of 
amino acid substitutions within the precursor subtilisin amino acid sequence. The term is meant to include 
subtilisin variants in which the DNA sequence encoding the precursor subtilisin is modified to produce a mutant 
DNA sequence which encodes the substitution of one or more amino acids in the naturally occurring subtilisin 

15 amino acid sequence. Suitable methods to produce such modification include those disclosed in U. S. Patent No. 
4,760,025 and 5,371,008 and in EPO Publication No. 0130756 and 025 1446. 

A change in substrate specificity is defined as a difference between the K e- /Km ratio of the precursor 
subtilisin and the subtilisin variant. The K^/Km ratio is a measure of catalytic efficiency. Subtilisin variants 
with increased or decreased K t- /Km ratios compared to the precursor subtilisin from which they were derived 

20 are described herein. Generally, the objective is to secure a variant having a greater, Le. numerically larger, 
K t JKm ratio for a given substrate. A greater K etf /Km ratio for a particular substrate indicates that the variant 
may be used to more efficiently cleave the target substrate. 

The specificity or discrimination between two or more competing substrates is determined by the ratios 

ofk c8 /Km (Fersht, A.R., (1985) in Enzyme Structure and Mechanism, WF Frccman 8X1(1 ^ NY - P 1 ,2 > 

25 increase in K ct /Km ratio for one substrate may be accompanied by a reduction in K^/Km ratio for another 
substrate. This shift in substrate specificity indicates that the variant subtilisin with the increased K^/Km ratio 
for the substrate has utility in cleaving the particular substrate over the precursor subtilisin in, for example, 
preventing undesirable hydrolysis of a particular substrate in a mixture of substrates. 

In general, for a subtilisin variant to have a useful catalytic efficiency for cleavage of a particular substrate 

30 the Kn/YLm ratio will generally be between 1 x 10 3 KfV to about 1 x 10 7 NT's" 1 . More often, the K^/Km ratio 
will be between about 1 x 10 4 Kf V 1 and I x 10 6 mV. 

When referring to mutants or variants, the wild type amino acid residue is followed by the residue number 
and the new or substituted amino acid residue. For example, substitution of D for wild type N in residue position 
62 is denominated N62D. 

3 5 "Subtilisin variants or mutants" are designated in the same manner by using the single letter amino acid 

code for the wild-type residue followed by its position and the single letter amino acid code of the replacement 
residue. Multiple mutants are indicated by component single mutants separated by slashes. Thus the subtilisin 
BPN' variant N62D/G1 66D is a di-substituted variant in which Asp replaces Asn and Gly at residue positions 
62 and 166, respectively, in wild-type subtilisin BPN'. 
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An amino acid residue of a precursor carbonyl hydrolase is "equivalent" to a residue of B. 
amyioliquefaciens subtilisin if it is either homologous (i.e., corresponding in position in either primary or tertiary 
structure) or analogous to a specific residue or portion of that residue in B. amyioliquefaciens subtilisin (i.e., 
having the same or similar functional capacity to combine, react, or interact chemically). 
5 in order to establish homology to primary structure, the amino acid sequence of a precursor carbonyl 

hydrolase is directly compared to the B. amyioliquefaciens subtilisin primary sequence and particularly to a set 
of residues known to be invariant in all subtilisins for which the sequences are known (see eg. Figure 5-C in EPO 
0251446). After aligning the conserved residues, allowing for necessary insertions and deletions in order to 
maintain alignment avoiding the elimination of conserved residues through arbitrary deletion and insertion), 

10 the residues equivalent to particular amino acids in the primary sequence of B. amyioliquefaciens subtilisin are 
defined. Alignment of conserved residues should conserve 100% of such residues. However, alignment of 
greater than 75% or as little as 50% of conserved residues is also adequate to define equivalent residues. 
Conservation of the catalytic triad, Asp32/His64/Ser22 1, is required. 

Equivalent residues homologous at the level of tertiary structure for a precursor carbonyl hydrolase whose 

15 tertiary structure has been determined by x-ray crystallography, are defined as those for which the atomic 
coordinates of 2 or more of the main chain atoms of a particular amino acid residue of the precursor carbonyl 
hydrolase and B. amyioliquefaciens subtilisin (N on N, CA on CA, C on C, and O on O) are within 0. 1 3nm and 
preferably 0.1 run after alignment Alignment is achieved after the best model has been oriented and positioned 
to give the maximum overlap of atomic coordinates of non-hydrogen protein atoms of the carbonyl hydrolase 

20 in question to the B. amyioliquefaciens subtilisin. The best model is the crystallographic model giving the lowest 
R factor for experimental diffraction data at the highest resolution available. 


25 R factor 2 


Z\Fo(h)\-\Fc(h)\ 
h 

l\Fo(h)\ 
h 


Equivalent amino acid residues of subtilisin BFN\ subtilisin Carslberg, therm itase and proteinase K from tertiary 
structure analysis is provided in, for example, Siezen, et al., (1991) Prot Eng. 4:719-737. 

30 Equivalent residues which are functionally analogous to a specific residue of B. amyioliquefaciens 

subtilisin are defined as those amino acids of die precursor carbonyl hydrolases which may adopt a conformation 
such that they either alter, modify or contribute to protein structure, substrate binding or catalysis in a manner 
defined and attributed to a specific residue of the B. amyioliquefaciens subtilisin as described herein. Further, 
they are those residues of the precursor carbonyl hydrolase (for which a tertiary structure has been obtained by 

3 5 x-ray crystallography), which occupy an analogous position to the extent that although the main chain atoms of 
the given residue may not satisfy the criteria of equivalence on the basis of occupying a homologous position, 
the atomic coordinates of at least two of the side chain atoms of the residue lie within 0.1 3nm of the 
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corresponding side chain atoms of B. amyloliquefaciens subtilisin. The three dimensional structures would be 
aligned as outlined above. 

Some of the residues identified for substitution are conserved residues whereas others are not In the case 
of residues which are not conserved, the replacement of one or more amino acids is Limited to substitutions which 
5 produce a mutant which has an amino acid sequence that does not correspond to one found in nature. In the case 
of conserved residues, such replacements should not result in a naturally occurring sequence. The subtilisin 
mutants of the present invention include the mature forms of subtilisin mutants as well as the pro- and prepro- 
forms of such subtilisin mutants. The prepro-forms are the preferred construction since this facilitates the 
expression, secretion and maturation of the subtilisin mutants. 

1 0 "Prosequence" refers to a sequence of amino acids bound to die N -terminal portion of the mature form of 

a subtilisin which when removed results in the appearance of the "mature" form of the subtilisin. Many 
proteolytic enzymes are found in nature as translational proenzyme products and, in the absence of post- 
translational processing, are expressed in this fashion. The preferred prosequence for producing subtilisin 
mutants, specifically subtilisin BPN' mutants, is the putative prosequence of B. amyloliquefaciens subtilisin 

15 although other subtilisin prosequences may be used. For example, when the substrate specificity of the precursor 
subtilisin is altered according to the present invention, this alteration may affect the ability of the variant 
subtilisin to undergo autolytic cleavage of the naturally occurring prosequence. In order to affect the expression 
and proper folding of a mature variant subtilisin whose substrate specificity has been altered, it may be necessary 
to alter the prosequence to correspond to the new or variant substrate specificity. 

20 As an example, the substrate specificity of a particular subtilisin variant N62D/Y 104D/G 166D is distinct 

from the precursor subtilisin from which h was derived The subtilisin variant prefers substrates containing basic 
residues at substrate positions corresponding to P4, P2, and PI . According to this aspect of the present invention, 
the precursor prosequence which was efficiently autolysed by the precursor subtilisin is altered to correspond 
to the substrate specificity of the variant subtilisin. Therefore, for the subtilisin variant N62D/Y 1 04/G 1 66D the 

2 5 prosequence would be altered to contain basic residues at positions -4, -2, and - 1 . 

A "signal sequence** or "presequence" refers to any sequence of amino acids bound to the N-terminal 
portion of a subtilisin or to the N-terminal portion of a prosubtilisin which may participate in the secretion of the 
mature or pro forms of the subtilisin. This definition of signal sequence is a functional one, meant to include all 
those amino acid sequences, encoded by the N-terminal portion of the subtilisin gene or other secretable carbonyl 

3 0 hydrolases, which participate in the effectuation of the secretion of subtilisin or other carbonyl hydrolases under 

native conditions. The present invention utilizes such sequences to effect the secretion of the subtilisin mutants 
as defined herein. 

A "prepro" form of a subtilisin mutant consists of the mature form of the subtilisin having a prosequence 
operably linked to the ammo-terminus of the subtilisin and a "pre* or "signal" sequence operably linked to the 
3 5 amino terminus of the prosequence. 

"Expression vector" refers to a DNA construct containing a DNA sequence which is operably linked to 
a suitable control sequence capable of effecting the expression of the DNA in a suitable host. Such control 
sequences include a promoter to effect transcription, an optional operator sequence to control such transcription, 
a sequence encoding suitable mRNA ribosome binding sites, and sequences which control termination of 
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transcription and translation. The vector may be a plasmid, a phage particle, or simply a potential genomic insert. 
Once transformed into a suitable host, the vector may replicate and function independently of the host genome, 
or may, in some instances, integrate into the genome itself. In the present specification, "plasmid" and "vector" 
arc sometimes used interchangeably as the plasmid is the most commonly used form of vector at present. 
5 However, the invention is intended to include such other forms of expression vectors which serve equivalent 
functions and which are, or become, known in the art. 

The "host cells" used.in the present invention generally are procaryotic or eucaryotic hosts which 
preferably have been manipulated by the methods disclosed in EPO Publication No. 0130756 or 0251446 or U.S. 
Patent No. 5371 ,008 to render them incapable of secreting enzymatically active endoprotease. A preferred host 
10 cell for expressing subtilisin is the Bacillus strain BG2036 which is deficient in enzymatically active neutral 
protease and alkaline protease (subtilisin). The construction of strain BG2036 is described in detail in EPO 
Publication No. 0130756 and further describe M.Y.,ef a/. (1984) J. Bacterid. 160:15-21. Such host 

cells are distinguishable from those disclosed in PCT Publication No. 03949 wherein enzymatically inactive 
mutants of intracellular proteases in E. coti are disclosed. Other host cells for expressing subtilisin include 
15 Bacillus subtilis var. 1168 (EPO Publication No. 0130756). 

Host cells are transformed or transfected with vectors constructed using recombinant DNA techniques. 
Such transformed host cells are capable of either replicating vectors encoding the subtilisin mutants or expressing 
the desired subtilisin mutant In the case of vectors which encode the pre or prepro form of the subtilisin mutant, 
such mutants, when expressed, are typically secreted from the host cell into the host cell medium. 
2 0 "Operably linked" when describing the relationship between two DNA regions simply means that they are 

functionally related to each other. For example, a presequence is operably linked to a peptide if it functions as 
a signal sequence, participating in the secretion of the mature form of the protein most probably involving 
cleavage of the signal sequence. A promoter is operably linked to a coding sequence if it controls the 
transcription of the sequence; a ribosome binding site is operably linked to a coding sequence if it is positioned 
25 so as to permit translation. 

The genes encoding the naturally-occurring precursor subtilisin may be obtained in accord with the general 
methods described in U.S. Patent No. 4,760,025 or EPO Publication No. 0130756. As can be seen from the 
examples disclosed therein, the methods generally comprise synthesizing labeled probes having putative 
sequences encoding regions of the hydrolase of interest, preparing genomic libraries from organisms expressing 
30 the hydrolase, and screening the libraries for the gene of interest by hybridization to the probes. Positively 
hybridizing clones are then mapped and sequenced. 

The cloned subtilisin is then used to transform a host cell in order to express the subtilisin. The subtilisin 
gene is then ligated into a high copy number plasmid. This plasmid replicates in hosts in the sense that it contains 
the well-known elements necessary for plasmid replication: a promoter operably linked to the gene in question 
3 5 (which may be supplied as the gene's own homologous promotor if it is recognized, L e. , transcribed, by the host), 
a transcription termination and polyadenylation region (necessary for stability of the mRNA transcribed by the 
host from the hydrolase gene in certain eucaryotic host cells) which is exogenous or is supplied by the 
endogenous terminator region of the subtilisin gene and, desirably, a selection gene such as an antibiotic 
resistance gene that enables continuous cultural maintenance of plasmid-infected host cells by growth in 
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antibiotic-containing media. High copy number plasmids also contain an origin of replication for the host 
thereby enabling large numbers of plasmids to be generated in the cytoplasm without chromosomal limitations. 
However, it is within the scope herein to integrate multiple copies of the subtilisin gene into host genome. This 
is facilitated by procaryotic and eucaryotic organisms which are particularly susceptible to homologous 
5 recombination. 

Once the subtilisin gene has been cloned, a number of modifications are undertaken to enhance the use 
of the gene beyond synthesis, of the naturally-occurring precursor subtilisin. Such modifications include the 
production of recombinant subtilisin as disclosed in U.S. Patent No. 5371,008 or EPO Publication No. 0130756 
and the production of subtilisin mutants described herein. 

1 o Mutant design and preparation. 

A. Subtilisin Variants Capable of Cleaving Substrates Having Dibasic Residues. 

For the preparation of subtilisin variants capable of cleaving substrates containing dibasic residues, the 
following analysis was undertaken. 

A number of structures have been solved of subtilisin with a variety of inhibitors and transition state 

15 analogs bound (Wright, C. S., Alden, R. A. and Kraut, J. (1969) Nature, 221535-242; McPhalen. CA. and 
James, N.G. (1988) Biochemistry, 27:6582-6598; Bode, W., Papamokos, E., Musil, D„ Seemueller, U. and Fritz, 
M. (1986) EMBOJ., 5:813-818; and Bott, R., UKsch, M., KossiakofT, A„ Graycar, T., Katz, B. and Power, S. 
(1988)7. Biol. Ckem., 263:7895-7906). One of these structures, Figure 1, was used to locate residues that are 
in close proximity to side chains at the PI and P2 positions from the substrate. Previous work had shown that 

20 replacement residues at positions 156 and 166 in the SI binding site with various charged residues lead to 
improved specificity for complementary charged substrates (Wells, J. A., Powers, D. B., Bott, R- R., Graycar, 
T. P. and Estell, D. A. (1987) Proc. Natl. Acad. ScL USA, 84:1219-1223). Although longer range electrostatic 
effects of substrate specificity have been noted (Russell, A. J. and Fersht, A. R. (1987) Nature , 328:496-500) 
these were generally much smaller than local ones. Therefore, it seemed reasonable that local differences in 

2 5 charge between subtilisin BPN 1 and the eukaryotic enzymes may account for the differences in specificity. 

A detailed sequence alignment of 35 different subtilisin-like enzymes (Siezen, R. J., de Vos, W. M, 
Leunissen, A. M., and Dijkstra, B. W. (1991) Prot. Eng., 4:719-737) allowed us to identify differences between 
subtilisin BPN' and the eukaryotic processing enzymes, KEX2, furin and PC2. Within the SI binding pocket 
there are a number of charged residues that appear in the pro-hormone processing enzymes and not in subtilisin 
30 BPN' (Table 1A). 
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TABLE 1 A 
SI substte 


5 



125-131* 

151-157 

163-168 

Subtilisin BPN' 

SLGGPSG 
(SEQID NO: 3) 

A AAGNEG 
(SEOIDNO: 4) 

ST-VG YP 
(SEQ ID NO: 5) 

Kex2 

S WGPADD 

(SEQ ID NO: 6) 

FASGNGG 
(SEQID NO: 7) 

CNYDGYT 
(SEQ ID NO: 8) 

Furin 

S WGPEDD 

(SEOIDNO: 9) 

WASGNGG 
(SEQ ID NO: 10) 

CNCDG YT 
(SEQ ID NO: 11) 

PC2 

SWGPADD 
(SEO ID NO: 6) 

WASGDGG 
(SEQ ID NO: 12) 

CNCDG Y A j 
(SEQ ID NO: 13) 


' numbering according to subtilisin BPN' sequence 

For example, the eukaryotic enzymes have two conserved Asp residues at 130 and 131 as well as an Asp at 165 
10 that is preceded by insertion of a Tyr or Cys. However, in the region from 151-157, subtilisin BPN' contains a 
Glu and the eukaryotes a conserved Gly. 

In the S2 binding site there were two notable differences in sequence (Table IB). 


TABLE IB 

Slsubstte 



30-35 

60-64 

Subtilisin BPN* 

V1DSGI 
(SEQ ID NO: 14) 

DNNSH 
(SEQ ID NO: 15) 

KEX2 

I VDDGL 
(SEQ ID NO: 16) 

SDDYH 

(SEO ID NO: 17) 

Furin 

ILDDGI 
(SEQ ID NO: 18) 

NDNRH 
(SEQ ID NO: 19) 

PC2 

1MDDG1 

(SEO ID NO: 20) 

WFNSH 
(SEO ID NO: 21) 


Subtilisin BPB* contains a Ser at position 33 whereas the pro-hormone processing enzymes contain Asp. There 
20 is not as clear a consensus in the region of 60-64, but one notable difference is at position 62. This side chain 
which points directly at the P2 side chain (Figure 1) is Asn in subtilisin BPN 1 , furin and PC2 but Asp in KEX2. 
Thus, not all substitutions were clearly predictive of the specificity differences. 

A variety of mutants were produced to probe and engineer the specificity of subtilisin BPN' using 
oligonucleotides described in Table 2. 
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TABLE 2 

Oligonucleotides used for site-directed 
mutagenesis on subtttlsUu 


| Mutant 

Oligonucleotide 

Specificity 
Pocket 

Activity 
Expressed 

S33D 

S*- GCGGTTATCGACG*A*CGGTATCGATTCT -3' 
(SEQ ID NO: 22) 

S2 


S33K 

5'- GCGGTTATCGACAA*A*G*GTATCGATTCT -3' 
(SEQ ID NO: 23) 

S2 

+ 

S33E 

5'- GCGGTTATCGACG*A*A*GGTATCGATTCT -3* 
(SEQIDNO: 24) 

S2 

+ 

N62D 

5'- CCAAGACAACG*ACTCTCACGGAA -3' 
(SEQ ID NO: 25) 

S2 

+ 

N62S 

5*- CCAAGACAACAG*CTCTCACGGAA -3' 
(SEQ ID NO: 26) 

S2 

+ 

N62K 

5'- CCAAGACAACAAA'TCTCACGGAA -3' 
(SEOIDNO: 27) 

S2 

+ 

G166D 

5^ACTTCCGGCAGCTCG*T # C # G*ACAGTGGA , C*T 

ACCCTGGC.AAATA-3' 

(SEO ID NO: 28) (Inserts Sal I she) 

SI 

+ 

G166E 

5 , -CACTTCCGGCAGCTCG*T*C*G*ACAGTGGA # GT 

ACCCTGGCAAATA-3 1 

(SEO ID NO: 29) (Inserts Sal I she) 

SI 

+ 

G128P/P129A 

5'- 7TAACATGAGCCTCGGCC*C*AG # CTA*G*C # GGT 

TCTGCTGCTTTA -3* 

(SEQ ID NO: 30) (Inserts Nhe 1 she) 

SI 


G128P/P129A/ 
S130D/G131D 

5'- TTAACATGAGCCTCGGCC*C*C*G*CGG*A*TGA* 

TTCTGCTGCTTTAAA -3' 

(SEQ ID NO: 3 1) (Inserts Sac II site) 

SI 


T164N/V165D 

5 , ^G(K:AGCTCAAGCA»A*C*G'A , T*GGCTAT«CCT 
GGCAAATACCCTTCTGTCA -3' 
(SEQ ID NO: 32) (Inserts BsaBI site) 

SI 
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T164Y/V165D 

5'-CGGCAGCTCAAGCA # A*C*G*A t T*GGCTAT*CCT 
GGCAAATACCCTTCTGTCA -3* 
(SEO ID NO: 33) (Inserts BsaBI sitc^ 

SI 


T164N- 

Y(insert)- 

V165D 

5-ACITCaK3CAGCTCrT*C*G*AA*C*T*A*C'G'A* 

C*GGGTACCCTGGCAAATA-3' 

(SEO ID NO: 34) (Inserts BstBI site) 

SI 


N62D/G166D 

See individual mutations 

S1/S2 

T 

N62D/G166E 

See individual mutations 

SI/S2 



* Asterisks indicate base changes from the pSS5 (wild-type) template. 


After producing the mutant plasmids they were transformed into a protease deficient strain of B. subtilis 
(BG2036) that lacks an endogenous gene for secretion of subtilisin. These were then tested for protease activity 

10 on skim milk plates. 

The first set of mutants tested were ones where segments of the SI binding she were replaced with 
sequences from KEX2. None of these segment replacements produced detectable activity on skim milk plates 
even though variants of subtilisin whose catalytic efficiencies are reduced by as much as 1 000-fold do produce 
detectable halos (Wells, J X Cunningham, B.C., Graycar, T.P. and Estell, D.A. (1986) Philos. Trans. R. Soc 

1 5 Land. A . 3 1 7:4 1 5-423). We went on to produce single residue 

substitutions that should have less impact on the stability. These mutants at positions 166 in the SI site, and 33 
and 62 in the S2 site, were chosen based on the modeling and sequence considerations described above. 
Fortunately all single mutants as well as combination mutants produced activity on skim milk plates and could 
be purified to homogeneity. 

2 0 Kinetic analysis of variant subtlihlns. 

To probe the effects of the G166E and GI66D on specificity at the PI position we used substrates having 
the form suc-AAPX-pna (SEQ ID NO: 69) where X was either Lys (SEQ ID NO. 58), Arg (SEQ ID NO. 59) , 
Phe (SEQ ID NO. 56), Met (SEQ ID NO. 60) or Gin (SEQ ID NO. 61 ). The k t JKm values were determined 
from initial rate measurements and results reported in Figure 2. Whereas the wild-type enzyme preferred 

25 Phe>Met>Lys>Arg>Gln, the G166E preferred Lys~Phe>Arg~Met>Gln, and G166D preferred 
Lys>Phe~Arg~Met>Gln. Thus, both the acidic substitutions at position 166 caused a shift in preference for basic 
residues at the PI site, as previously reported (Wells, J. A., Powers, D. B., Bott, R. R., Graycar, T. P.and Estell, 
D. A. (1987a), Proc. Natl Acad. Set. USA 84:1219-1223). 

The effects of single and double substitutions in the S2 binding site were analyzed with substrates having 

30 the form, suc-Ate-Ala-Xaa-Phe-pna (SEQ ID NO. 70) and are shown in Figure 3. At the P2 position the wild- 
type enzyme preferred Ala>Pro>Lys>Arg>Asp. In contrast, the S33D preferred Ala>Lys~Arg~Pro>Asp and 
the N62D preferred Lys>Ala>Arg>Pro>Asp. Although the effects were more dramatic for the N62D mutant, 
the S33D variant also showed significant improvement toward basic P2 residues and corresponding reduction 
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in hydrolysis of the Ala and Asp P2 substrates. We then analyzed the double mutant, but found it exhibited the 
catalytic efficiency of the worse of the two single mutants for each of the substrates tested. 

Despite the less than additive effects seen for the two charged substitutions in the S2 she, we decided to 
combine the best S2 site variant (N62D) with either of the acidic substitutions in the SI she. The two double 
5 mutants, N62D/G 1 66E and N62D/G 1 66D, were analyzed with substrates having the form, sue- AAXX-pna (SEQ 
ID NO. 7 1) where XX was ehher KCK (SEQ ID NO. 66), KR (SEQ ID NO. 67), KF (SEQ ID NO. 62), PK (SEQ 
ID NO. 58), PF (SEQ ID NO. 56) or AF (SEQ. ID NO. 63) (Figure 4). The wild-type preference was 
AF>PF~KF>KK~PK>KR, whereas the double mutants had the preference KK>KR>KF>PK~AF>PF. Thus for 
the double mutants there was a dramatic improvement toward cleavage of dibasic substrates and away from 

10 cleaving the hydrophobic substrates. 

The greater than additive effect (or synergy) of these mutants can be seen from ratios of the catalytic 
efficiencies for the single and multiple mutants. For example, the G166E variant cannot distinguish Lys from 
Phe at the PI position. Yet the N62D/G 166E variant cleaves the Lys-Lys substrate about 8 times faster than the 
Lys-Phe substrate. Similarly the G166D cleaves the Lys PI substrate about 3 times faster than the Phe PI 

15 substrate, but the N62D/G166D double mutant cleaves a Lys-Lys substrate 18 times faster than a Lys-Phe 
substrate. Thus, as opposed to the reduction in specificity seen for the double mutant in the S2 site, the S 1-S2 
double mutants enhance specificity for basic residues. It is possible that these two sites bind the dibasic 
substrates in a cooperative manner analogous to a chelate effect 

Therefore, according to the present invention, subtilisin mutants having a preference for dibasic residues 

20 are preferred. According to this aspect of the present invention substitution of amino acids corresponding to 
amino acids N62 and G 166 of subtilisin BPN 1 produced from Bacillus amyloliqvefaciens are prepared. In 
particular, amino acids 62 and 166, or their equivalents, in the precursor subtilisin are substituted with amino 
acid residues Asp or Glu. Preferred subtilisin variants according to this aspect of the invention include 
N62D/G166D,N62E/G166E,N62E/G166D. and N62D/G1 66E variants of subtilisin BPN* and their equivalents. 

25 B. Subtilisin Variants Capable of Cleaving Substrates Having Tribasic Residues 

For the preparation of subtilisin variants specific for substrates containing a third basic residue at substrate 
position P4 we used the crystal structure of subtilisin BPN' complexed with Ala-Ala-Pro-Phe-Boronate(SEQ ID 
NO: 56) (Figure 7) in combination with sequence alignments of subtilisin BPN\ KEX2, Furin, PC2, and P 
(Table 3) in designing basic specificity into the SI and S2 and S4 subsites. The two subtilisin BPN' residues 

3 0 that most prominently display their side chains into the S4 pocket are Y 1 04 and 1 1 07 (Figure 7). 

Sequence alignments of subtilisin BPN' and the mammalian prohormone-processing proteases (Siezen, 
R. L, de Vos, W. M„ Leunissen, A. M., and Dijkstra, B. W. (1991 ) Prot Eng. 4:7 19-737) (Table 3) reveal that 
position 104 is conserved as Asp, and 107 as Glu in the prohormone converting (Arg-P4 specific) enzymes. 
Therefore these two mutations were introduced either individually or in combination into the dibasic-specific 

3 5 N62D/G 1 66D subtilisin BPN' background (Table 4). 
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Table 3 Sequence alignments for the S4 site of subtilisins 

S4Site 
100-110 


PCTA)S96/02861 


Subtil is in 

KEX2 

Pur in 

PC2 

P 


GSGQYSWIING 
GDITTEDEAAS 
GEVTDAVEARS 
PFMTDI IEASS 
GIVTDAIEASS 


(SEQ ID NO: 77) 

(SEQ ID NO: 78) 

(SEQ ID NO: 79) 

(SEQ ID NO: 80) 

(SEQ ID NO: 81) 


10 Table 4 describes oligonucleotides used for site-directed mutagenesis, protein regions affected by the 

mutations, and relative expression of protein for N62D/G166D subtil isin BPN 1 variants. Bold type indicates 
base changes from the pSS5 (N62D/G166D) template. For "Protein Expressed," indicates a high level 
of expression of mature enzyme in crude culture medium, and indicates no enzyme detectable. 

TABLE 4 


15 


Protein 


QliqpDUgltotlfli 


Pro- 
tain 


Y104D 
I107E 

Y104D/I107E 

A(-4)R/ 
20 A(-2)K/ 
Y(-1)R 

Y104D/ 
A(-4)R/ 
A(-2)K/ 
25 Y(-1)R 

I107E/ 
A(-4)R/ 
A(-2)K/ 
Y(-1)R 

30 Y104D/I107E/ 
A(-4)R/ 
A(-2)K/ 
Y(-1)R 


5« - GGTTCCQGCCAA . OATAGCTGGATCATT -3» 
(SEQ ID NO: 82) 

5'- CCAATACAGCTGGOAAATTAACGGAATCG -3' 
(SEQ ID NO: 83) 

5*. GGTTCCGGCCAAOATAGCTGGOAAATTAACG 
GAATCGA -3» (SEQ ID NO: 64) 

5' - AAGAAGATCACGTAAGACATAAGCGCGCGC 
AGTCCGTGC -3' (SEQ ID NO: 85) 

See individual mutations 


See individual mutations 


See individual mutations 
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Initial attempts to express the triple mutants in Bacillus were unsuccessful, as indicated by SDS-PAGE 
of crude supematants. We reasoned that the source of the expression problem could lie in the fact that 
correct folding and maturation of subtilisin requires autotytic cleavage of its propeptide (Power, S.D., 
Adams, R. M., and Wells, J. A, (1986) Proc. Natl Acad Sci, USA 83, 3096-3100). The processing site in 
5 the wild-type enzyme has a sequence that is optimized for the natural substrate preference, AHAY I A (1 
denotes the site of cleavage). Although the N62D/G 166D subtilisin can still autolyze itself with the wild- 
type processing she, the additional S4 pocket mutations could reduce the cleavage to the point where 
expression was lowered to a minute level. 

To test whether the mutants were expressed poorly due to an inability to autolytically process itself, 
1 o mutations in the processing site were simultaneously incorporated to accommodate the changes in substrate 
specificity. Thus the sequence from positions -4 to -1 was changed from AHAY to RHKR in combination 
with the S4 site mutations. For N62D/Y104D/G 166D, high levels of expression could then be achieved 
providing an indication that the additional Y104D mutation induced an especially strong preference for P4 
Arg over Ala. Variants containing the I107E mutation, however, could not be expressed even with the 
1 5 change in the processing site. 

Kinetic analysis of variant subtilisins 

The mature N62D/Y 1 04D/G 1 66D variant was purified and analyzed for its ability to hydrolyze several 
tetrapeptide-pNA substrates. Table 5 displays the results along with data for the N62D/G 166D mutant and 
wild-type subtilisin. 
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The tribasic substrates succinyl-RAKR-pNA (SEQ ID NO: 86) and succ iny 1-KAKR-pN A (SEQ ID 
NO: 87) were hydrolyzed with high catalytic efficiency (k ca /Km) by the triple mutant, at a level similar to 
wild-type subtilisin versus one of its best substrates, succinyl-AAPF-pNA (SEQ ID NO: 56). In contrast, 
the dibasic substrate succinyl-AAKR-pNA (SEQ ID NO: 67) was hydrolyzed 60-fold less efficiently, mostly 
5 due to dimunhion of k^. This indicates a dramatic specificity change from the wild-type preference at P4, 
at which hydrophobic residues are strongly favored over charged side chains (Gren, H. and Breddam, K. 
(1992) Biochemistry 31, 8967-8971). In fact N62D/G166D subtilisin appears to cleave at an alternate site 
in the succinyl-RAKR-pNA (SEQ ID NO: 86) substrate, indicating that Arg was not accepted in its wild-type 
S4site. 

10 The large magnitude of the combined specificity changes in the N62D/Y104D/G166D variant is 

evidenced by its strong discrimination against substrates that are preferred by the wild-type enzyme. For 
example, succinyl-AAPF-pNA (SEQ ID NO: 56) is hydrolyzed 6 x 10 4 -fold less efficiently than succinyl- 
RAKR-pNA (SEQ ID NO: 86). Clearly, the S4 site mutation greatly improves upon the discriminatory 
power of the parent dibasic-specific N62D/G166D subtilisin, where the ratio of catalytic efficiency for 

15 succinyl-AAKR-pNA versus succinyl-AAPF-pNA is 1.9 x I0 2 . The improvement in discrimination (3 10- 
fold) is also higher than would be predicted from the data for hydrolysis of succtnyl-RAKR-pNA (SEQ ID 
NO: 86) versus succinyl-AAKR-pNA (SEQ ID NO: 67) by the triple mutant (a 60-fold effect). 

Therefore in order to produce subtilisin variants capable of cleaving substrates containing basic 
residues at positions P4, P2, and PI, additional site specific substitutions are made in the dibasic specific 

2 0 subtilisin variants. According to this aspect of the invention, substitution of the amino acid corresponding 

to Y 1 04 of subtilisin BPN' produced by Bacillus Amyloliquefaciens, i.e., amino acid 1 04 of subtilisin BPN' 
or its equivalent, produces a variant having substantially altered substrate specificity. In a preferred 
embodiment of the present invention amino acids corresponding to N62, Y 104, and G 166 of subtilisin BPN* 
are substituted with acidic amino acids, preferably Asp and Ghi and most preferably Asp. Subtilisin BPN' 
25 variants N62D/Y104D/G166D, N62D/Y104E/G166D. N62E/Y104D/G166E, N62E/Y104E/G166E, 
N62E/Y104D/G166D, N62E/Y104E/G166D, N62D/Y104E/G166E, and N62D/YI04D/G166E, and there 
equivalents are preferred. Most preferred among this group of subtilisin variants are the 
N62D/Y 104D/G 1 66D subtilisin BPN* variants and their equivalents. 

Mutagenesis and Synthetic Techniques 

3 0 Various techniques are available which may be employed to produce mutant DN A, which can encode 

the subtilisin variants of the present invention. For instance, it is possible to derive mutant DNA based on 
naturally occurring DNA sequences that encode for changes in an amino acid sequence of the resultant 
protein relative to a precursor subtilisin. These mutant DNA can be used to obtain the variants of the present 
invention. 

35 According to the invention, specific residues of B. amyloliquefaciens subtilisin are identified for 

substitution. These amino acid residue position numbers refer to those assigned to the B. amyloliquefaciens 
subtilisin sequence (see the mature sequence m Fig. I. of U.S. Patent No. 4,760,025). The invention, 
however, is not limited to the mutation of this particular subtilisin but extends to precursor subtilisins 
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containing amino acid residues which are equivalent, as defined herein, to the particular identified residues 
in B. amyloliquefaciew subtilisin. Equivalent amino acids can be found in, for instance, subtilisn Carlesberg 
from Bacillus licheniformis, Bode et al., (1986) EMBO J., 5:813-818; thermitase from Thermoactinomyces 
vulgaris, Gros et al., (1989)J. Mol. Biol. 210:347-367; and proteinase K from Tritirachium album. Betzel. 
5 et al., (1988) Acto Crysollogr., B, 44:163-172) as described by Siezen et al., (1991) Prof. Eng. 4: 719-737). 

By way of illustration, with expression vectors encoding the precursor subtilisin in hand (see for 
example U.S. Patent No 4,760,025) site specific mutagenesis (Kunkel et al., (1991) Methods Enzymol. 
204:125-139; Carter, P., et al., (1986) Nucl. Acids. Res. 13:433 1 ; Zoller, M. J. et al., (1982) Nucl. Acids 
Res. 10:6487), cassette mutagenesis (Wells, J. A., et al., (1985) Gene 34:315), restriction selection 

10 mutagenesis (Wells, J. A., et al., (1 986) Philos. Trans, R. Soc. London Ser A 3 1 7, 4 1 5) or other known 
techniques may be performed on the DN A. The mutant DNA can then be used in place of the parent DN A 
by insertion into the appropriate expression vectors. Growth of host bacteria containing the expression 
vectors with the mutant DNA allows the production of variants which can be isolated as described herein. 
01ig<mucteotide-mediated mutagenesis is a preferred method for preparing the variants of the present 

15 invention. This technique is well known in the art as described by Adelman et al., (1983) DMA, 2: 1 83. 
Briefly, the native or unaltered DNA of a precursor subtilisin, for instance subtilisin BPN\ is altered by 
hybridizing an oligonucleotide encoding the desired mutation to a DNA template, where the template is the 
single-stranded form of a plasmid or bacteriophage containing the unaltered or native DNA sequence of the 
precursor. 

2 0 After hybridization, a DNA polymerase is used to synthesize an entire second complementary strand 

of the template that will thus incorporate the oligonucleotide primer, and will code for the selected alteration 
in the DNA. 

Generally, oligonucleotides of at least 25 nucleotides in length are used An optimal oligonucleotide 
will have 12 to 15 nucleotides that are completely complementary to the template on either side of the 
25 nucleotide^) coding for the mutation. This ensures that the oligonucleotide will hybridize properly to the 
single-stranded DNA template molecule. The oligonucleotides are readily synthesized using techniques 
known in the art such as those described by Crea et al. (1987) Proc. Natl. Acad. Sci. USA, 75:5765. 
Exemplary oligonucleotide sequences for introducing amino acid changes into precursor subtilisin BPN' are 
provided in Tables 2 and 4. 

30 Single-stranded DNA template may also be generated by denaturing double-stranded plasmid (or 

other) DNA using standard techniques. 

For alteration of the native DNA sequence (to generate amino acid sequence variants, for example). 

the oligonucleotide is hybridized to the single-stranded template under suitable hybridization conditions. 

A DNA polymerizing enzyme, usually the Klenow fragment of DNA polymerase 1, is then added to 
3 5 synthesize the complementary strand of the template using the oligonucleotide as a primer for synthesis. A 

heteroduplex molecule is thus formed such that one strand of DNA encodes the variant form of the subtilisin. 

and the other strand (the original template) encodes the native, unaltered sequence of the precursor subtilisin. 

This heteroduplex molecule is then transformed into a suitable host cell. After the cells are grown, they are 

plated onto agarose plates and screened using the oligonucleotide primer radiolabeled with 32-phosphate to 
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identify the bacterial colonies that contain the mutated DNA. The mutated region is then removed and 
placed in an appropriate vector for protein production, generally an expression vector of the type typically 
employed for transformation of an appropriate host. 

The method described immediately above may be modified such that a homoduplex molecule is 
5 created wherein both strands of the plasmid contain the mutation(s). The modifications are as follows: The 
single-stranded oligonucleotide is annealed to the single-stranded template as described above. A mixture 
of three deoxyribonucleotides, deoxyriboadenosine (dATP), deoxyriboguanosine (dGTP), and 
deoxyribothymidinc (dTTP), is combined with a modified thio-deoxyribocytosine called dCTP-(oS) (which 
can be obtained from Amersharo Corporation). This mixture is added to the template-oligonucleotide 
10 complex. Upon addition of DNA polymerase to this mixture, a strand of DNA identical to the template 
except for the mutated bases is generated. In addition, this new strand of DNA will contain dCTP-(aS) 
instead of dCTP, which serves to protect it from restriction endonuclease digestion. 

After the template strand of the double-stranded heteroduplex is nicked with an appropriate restriction 
enzyme, the template strand can be digested with Eafilll nuclease or another appropriate nuclease past the 
15 region that contains the site(s) to be mutagen ized. The reaction is then stopped to leave a molecule that is 
only partially single-stranded. A complete double-stranded DNA homoduplex is men formed using DNA 
polymerase in the presence of all four deoxyribonucleotide triphosphates, ATP, and DNA ligase. This 
homoduplex molecule can men be transformed into a suitable host cell as described above. 

DNA encoding variants with more than one amino acid to be substituted may be generated in one of 

2 0 several ways. If the amino acids are located close together in the polypeptide chain, they may be mutated 

simultaneously using one oligonucleotide that codes for all of the desired amino acid substitutions. If, 
however, the amino acids are located some distance from each other (separated by more than about ten amino 
acids), it is more difficult to generate a single oligonucleotide that encodes all of the desired changes. 
Instead, one of two alternative methods may be employed. 
25 In the first method, a separate oligonucleotide is generated for each amino acid to be substituted. The 

oligonucleotides are then annealed to the single-stranded template DNA simultaneously, and the second 
strand of DNA that is synthesized from the template will encode all of the desired amino acid substitutions. 

The alternative method involves two or more rounds of mutagenesis to produce the desired mutant. 
The first round is as described for the single mutants: wild-type DNA is used for the template, an 
30 oligonucleotide encoding the first desired amino acid substitution(s) is annealed to mis template, and the 
heteroduplex DNA molecule is men generated. The second round of mutagenesis utilizes the mutated DNA 
produced in the first round of mutagenesis as the template. Thus, this template already contains one or more 
mutations. The oligonucleotide encoding the additional desired amino acid substitution(s) is then annealed 
to this template, and the resulting strand of DNA now encodes mutations from both the first and second 

3 5 rounds of mutagenesis. This resultant DNA can be used as a template in a third round of mutagenesis, and 

soon. 
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Cleavage of a Fusion Proteins With Subtilisin Variants 

A fusion protein is any polypeptide that contains within it an affinity domain (AD) that usually aids 
in protein purification, a protease cleavage sequence or substrate Linker (SL), which is cleaved by a protease 
and a protein product of interest (PP). Such fusion proteins are generally expressed by recombinant DN A 
5 technology. The genes for fusion proteins are designed so that the SL is between the AD and PP. These 
usually take the form AD-SL-PP such that the domain closest to the N -terminus is AD and PP is closest to 
the C -terminus. 

Examples of AD would include, glutathione-S-transferase which binds to glutathione, protein A (or 
derivatives or fragments thereof) which binds IgG molecules, poly-histidine sequences, particularly (His)6 
10 (SEQ ID NO: 51) that bind metal affinity columns, maltose binding protein that binds maltose, human 
growth hormone that binds the human growth hormone receptor or any of a variety of other proteins or 
protein domains that can bind to an immobilized affinity support with an association constant (Kb) of > 10 s 
NT 1 . 

The SL can be any sequence which is cleaved by the subtilisin variants of the present invention. In 

1 5 preparations where the variant N62D/Y1 04D/G166D or its equivalent are used the SL can be any sequence, 
preferably at least 4 amino acids, in which the P4, P2, and PI residues are basic residues. Therefore a SL 
linker is employed of the general formula P4-P3-P2-P1 wherein P4, P2, and PI are basic amino acid 
residues. Preferred SLs according to mis aspect of the invention include Lys-Ala-Lys-Arg (SEQ ID NO: 87) 
and Arg-Ala-Lys-Arg (SEQ ID NO: 86). 

20 Likewise, where the N62D/G166D subtilisin variant is contemplated the SL preferably contains di- 

basic residues. For the variants capable of cleaving substrates containing dibasic residues the SL should be 
at least four residues and preferably contain a large hydrophobic residue at P4 (such as Leu or Met) and 
dibasic residues at P2 and PI (such as Arg and Lys). A particularly good substrate is Leu-Met-Arg-Lys- 
(SEQ ID NO: 52), but a variety of other sequences may work including Ala-Ser-Arg-Arg (SEQ ID NO: 50) 

25 and even Leu-Thr-Ala-Arg (SEQ ID NO 53). 

It is often useful that the SL contain a flexible segment on its N-terminus to better separate it from the 
AD and PP. Such sequences include Gly-Pro-Giy-Gly (SEQ ID NO: 54) but can be as simple as Gly-Gly 
or Pro-Gly. Thus, an example of a particularly good SL would have the sequence Giy-Pro-Gly-Gly-Leu- 
Met-Arg-Lys (SEQ ID NO: 88) in the case of subtilisin variants capable of cleaving substrates containing 

30 dibasic ammo acids, or Gly- Pro-Gly-Gly- Lys -Ala -Lys -Arg (SEQ ID NO: 89). This sequence 

would be inserted between the AD and PP domains. 

The PP can be virtually any protein or peptide of interest but preferably should not have a Pro, He, 
Thr, Val, Asp or Glu as its first residue (PF), or Pro or Gly at the second residue (P2') or Pro at the third 
residue (P3') Such residues are poor substrates for the enzyme and may impair the ability of the subtilisins 
3 5 variant to cleave the SL sequence. 

The conditions for cleaving the fusion protein are best done in aqueous solution, although it should 
be possible to immobilize the enzyme and cleave the soluble fusion protein. It may also be possible to cleave 
the fusion protein as it remains immobilized on a solid support (e.g. bound to the solid support through AD) 
with the soluble subtilisin variant It is preferable to add the enzyme to the fusion protein so that the enzyme 
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is less than one pan in 100 (1:100) by weight A good buffer is 10-30mM Tris (pH 8.2) in lOmM NaCl. 
A preferable temperature is about 25°C although the enzyme is active up to 65°C. The extent of cleavage 
can be assayed by applying samples to SDS-PAGE. Generally suitable conditions for using the subtilisin 
variants of this invention do not depart substantially from those known in the art for the use of other 
5 subtilisins. 

EXAMPLES 

In the examples below and elsewhere, the following abbreviations are employed: subtilisin BPN\ 
subtilisin from Bacillus amyldiquefaciens\ Boc-RVRR-MCA <SEQ ID NO. 73) ,N-t-butoxy carbonyl- 
ai^ine-valine-arginine-arginir^7-amkk>^methyl coumarin; suc-Ala-Ala-Pro-Phe-pna (SEQ ID NO. 
10 56),N-siiccinyl-alanine-alanine-^ ID NO. 56) ; hGH, human 

growth hormone; hGHbp, extracellular domain of the hGH receptor; PBS, phosphate buffered saline; AP, 
alkaline phosphatase; 

Example 1 

Construction and Purification of Subtilisin Mutants. 
1 5 Site-directed mutations were introduced into the subtilisin BPN* gene cloned into the phagemid pSS 5 

(Wells, J. A., Ferrari, E., Henner, D. J., Estell, D.A. and Chen, E. Y. (1983) Nuci Acids Res. 1 1:791 I- 
7929). Single-stranded uracil-containing pSS5 template was prepared and mutagenesis performed using the 
method of Kunkel (Kunkel, T. A. , Bebenek, K and McClary, J. (1991) Methods Enzymol 204:125-139). 
For example, the synthetic oligonucleotide K62D, 


20 (5'- CCAAGACAACG*ACTCTCACGGAA -V) (SEQ ID 

NO. 25) 

in which the asterisk denotes a mismatch to the wild-type sequence, was used to construct the N62D mutant. 
The oligonucleotide was first phosphorylated at the 5* end using T4 polynucleotide kinase according to a 
described procedure (Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) in "Molecular Cloning: A 

2 5 Laboratory Manual," Second Edition, Cold Spring Harbor, N.Y.). The phosphorylated oligonucleotide was 
annealed to single-stranded uracil-containing pSS5 template, the complementary DN A strand was filled in 
with deoxynucleotides using T7 polynucleotide kinase, and the resulting nicks ligated using T4 DNA ligase 
according to a previously described procedure (Sambrook, J., Fritsch, E. F., and Maniatis, T. (198?) in 
"Molecular Cloning: A Laboratory Manual," Second Edition, Cold Spring Harbor, N.Y.). Heteroduplex 

30 DNA was transformed into the E. coli host JM 1 0 1 (Yanish-Perron, C, Viera, J., and Messing, J. (1985) Gene 
33: 103-199), and putative mutants were confirmed by preparation and dideoxy nucleotide sequencing of 
single stranded DNA (Sanger, F., Nicklen, S. and Coulson, A. R. (\971)Proc. Natl Acad ScL USA 
74:5463-5467) according to the SEQUENASE* protocol (USB Biochemicals). Mutant single-stranded 
DNA was then retransforrned into JM 101 cells and double stranded DNA prepared according to a previously 
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described procedure (Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) in "Molecular Cloning: A 
Laboratory Manual," Second Edition, Cold Spring Harbor, N.Y.). For other mutations also requiring the use 
of one primer, the oligonucleotides used are listed in Table 2. For several of these oligonucleotides, 
additional silent mutations emplacing new restriction sites were simultaneously introduced to provide an 
5 alternative verification of mutagenesis. 

To construct the double mutants N62D/G166D, and N62D/G166E, pSS5 DNA containing the N62D 
mutation was produced in single-stranded uracil-containing form using the Kunkel procedure (Kunkel, T. 
A. , Bebenek, K and McOary. J. (1991) Methods EnzymoL 204, 125-139). This mutant DNA was used as 
template for the further introduction of the G166D or G166E mutations, using the appropriate 

1 o oligonucleotide primers (see sequences in Table 2), following the procedures described above. 

To construct the triple mutants, such as N62D/Y104D/G166D, pSS5 DNA containing the 
N62D/G166D mutation or other appropriate double mutation, was produced in single-stranded uracil- 
containing form using the Kunkel procedure (Kunkel, T. A. . Bebenek, K and McClary, J. (1991) Methods 
EnzymoL 204, 125-139). This mutant DNA was used as template for the further introduction of the Y 1 04D 
15 mutations, using the appropriate oligonucleotide primers (see sequences in Table 4), following the 
procedures described above. 

For expression of the subtilisin BPN* mutants, double stranded mutant DNA was transformed into a 
protease-deficicnt strain (BG2036) of Bacillus Subtilis (Yang, M. Y., Ferrari, E. and Henner, D. J. 
(1984yoi//7io/ of Bacteriology 160:15-21) according to a previous method (Anagnostopolouus, C. and 

2 0 Spizizen, J. (1961) Journal of Bacteriology 81:741 -746) in which transformation mixtures were plated out 

on LB plus skim milk plates containing 12.5 Mg/mL chloramphenicol. The clear halos indicative of skim 
milk digestion surrounding transformed colonies were noted to roughly estimate secreted protease activity. 

The transformed BG2036 strains were cultured by inoculating 5 mL of 2xYT media (Miller, J. H., 
(1972) in -Experiments in Molecular Genetics," Cold Spring Harbor, N.Y.) containing 12.5 ngtotL 

25 chloramphenicol and2 mM CaCI 2 at 37 °C for 18-20 h, followed by 1:100 dilution in the same medium and 
growth in shake flasks at 37 °C for 18-22 h with vigorous aeration. The cells were harvested by 
centrifugation (6000g, 1 5 min, 4°C), and to the supernatant 20mM (final) CaClj and one volume of ethanol 
(-20°C) were added. After 30 min at 4°C, the solution was centrifiiged (12,000g, 15 min, 4°C), and one 
volume of ethanol (-20°C) added to the supernatant. After 2 h at -20°C, the solution was ccntrifuged 

30 (12,000g, 15 min, 4°C) and the pellet resuspended in and dialyzed against MC (25 mM 2-(N- 
Morpholino)ethanesulfonic acid (MES), 5 mM CaClj at pH 5.5) overnight at 4°C. The dialysate was passed 
through a 0.22 nm syringe filter and loaded onto a mono-S cation exchange column run by an FPLC system 
(Pharmacia Biotechnology). The column was washed with 20 volumes of MC and mutant subtilisin eluted 
over a linear gradient of zero to 0.15 M NaCl in MC, all at a flow rate of 1 mL/min. Peak fractions were 

3 5 recovered and the subtilisin mutant quanthated by measuring the absorbance at 280 nm (Ejgo 0. 1% = 1 . 1 7) 
(Matsubara, H.; Kasper, C B.; Brown, D. M.; and Smith, E. L. (\965p. Biol. Ctew., 240:1 125-1 130.) 
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Example 2 

Kinetic Characterizations 

Subtilisins were assayed by measuring the initial rates of hydrolysis of /Miitroanilidc tetrapeptide 
substrates in 0.4 mL 20 mM Tris-Cl pH 8.2, 4 % (v/v) dimethyl sulfoxide at (25 ± 02)°C as described 
5 previously (Estell, D. A., Graycar, T. P., Miller, J. V., Powers, D. B., Burnier, J. P., Ng, P. G. and Wells, 
J. A. (\9*6)Science 233:659-663). Enzyme concentrations (E)o were determined spectrophotometries ly 
using EjjoanjO.1% - 1.17 (Matsubara, H.; Kasper, C B.; Brown, D. M.; and Smith, E. L. (1965)J. Biol. 
Chem.,240: 1125-1 130.), and were typically 5-50 nM in reactions. Initial rates were determined for nine to 
twelve different substrate concentrations over the range of 0.001-2.0 mM. Plots of initial rates (v) versus 
1 0 substrate concentration (S) were fitted to the Michaelis -Men ton equation, 

UE)o((S)) 
v = 

Km + (S) 

to determine the kinetic constants k^ and Km (Fersht, A. in "Enzyme Structure and Mechanism 1 *, Second 
1 5 edition. Freeman and Co., N.Y.) using the program Kaleidagraph (Synergy Software, Reading, PA). 

Example 3 

Substrate Phage 

Substrate phage selections were performed as described by Matthews and Wells (Matthews, D. J. and 
Wells, J. A. ( 1 993)Seience 260: 11 1 3- 1 1 1 7), with minor modifications. Phage sorting was carried out using 
20 a library in which the linker sequence between the gene III coat protein and a tight-binding variant of hGH 
was GPGGX5GGPG (SEQ ID NO. 52). The library contained 2 x 10* independent trans formants. Phage 
particles were prepared by infecting 1 mL of log phase 27C7 (F/tet R /Ompt' / degP') Escherichia coli with 
approximately 10 B library phage for 1 h at 37°C, followed by 18-24 h of growth in 25 mL 2YT medium 
containing 10 10 M13K07 helper phage and 50 ug/mL carbenicitlin at 37°C. Wells of a 96-well Nunc 

2 5 Maxisorb microther plate were coated with 2 ug/mL of hGHbp in 50 mM NaHC0 3 at pH 9.6 overnight at 

4°C and blocked with PBS (10 mM sodium phosphate at pH 7.4 nd 150 mM NaCI) containing 2.5% (w/v) 
skim milk for 1 h at room temperature. Between 10 11 and 10 12 phage in 0.1 mL 10 mM tris-Cl (pH 7.6), 1 
mM EDTA, and 1 00 mM NaCI were incubated in the wells at room temperature for 2 h with gentle agitation. 
The plate was washed first with 20 rinses of PBS plus 0.05% Tween 20 and then twice with 20 mM tris-Cl 
30 at pH 8.2. The N62D/G166D subtilisin was added in 0.1 mL of 20 mM tris-Cl at pH 8.2 and protease 
sensitive phage were eluted after a variable reaction time. The concentration of protease and incubation 
times for elution of sensitive phage were decreased gradually over the course of sorting procedure to increase 
selectivity, with protease concentrations of 0.2 nM (rounds 1-3) and 0.1 nM (rounds 4-9), and reaction times 
of 5 min (rounds 1-6), 2.5 min (round 7), 40 5 (round 8) and 20 s (round 9). Control wells in which no 

3 5 protease was added were also included in each round. For the resistant phage pool, the incubation time with 

protease remained constant at 5 min. The wells were then washed ten times with PBS plus 0.05% Tween 
20 and resistant phage eluted by treatment with 0. 1 mL of 02 M glycine at pH 2.0 in PBS plus 0.05% Tween 
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20 for 1 min at room temperature. Protease sensitive and resistant phage pools were titered and used to 
infect log phase 27C7 cells for 1 h at 37°C, followed by centrifugation at 4000 rpm, removal of supernatant, 
and resuspension in 1 mL 2YT medium. The infected cells were then grown 18-24 h in the presence of 
helper phage as described above and the process repeated 9 times. Selected substrates were introduced into 
5 AP fusion proteins and assayed for relative rates of cleavage as described by Matthews and Wells (Matthews, 
D. J., Goodman, L. J., Gorman, C M., and Wells, J. A. (1994)Protetn Science 3:1 197-1205 and Matthews, 
D. J. and Wells, J. A. ( 1 993)Spience 260: 1113-1117), except that the cleavage reactions were performed in 
20 mM Tris-ClatpH 8.2. 

Example 4 

10 Substrate phage selection and cleavage of a fusion protein 

Subtilisin has the capability to bind substrates from the P4 to P3* positions (McPhalen, C. A. and 
James, N. G. (1988) Biochemistry 27:6582-6598 and Bode, W., Papamokos, E. t Musil, D., Seemueller, U. 
and Fritz. M. (1986) EMBO J. 5:8 1 3-8 1 8). Given this extensive binding site and the apparent cooperative 
nature in the way the substrate can bind the enzyme we wished to explore more broadly the substrate 

1 5 preferences for the enzyme. To do mis we utilized the substrate phage selection (Matthews, D. J., Goodman, 
L. J„ Gorman, C. M., and Wells, J. A. (I994)Proteui Science 3:1 197-1205 and Matthews, D. J. and Wells, 
J. A. (1993)&*wce 260:1 113-1117) described in Example 3. In this method a five-residue substrate linker 
that was flanked by di-grycine residues is inserted between an affinity domain (in this case a high affinity 
variant of hGH) and the carboxy-terminal domain of gene 111, a minor coat protein displayed on the surface 

20 of the filamentous phage, M13. The five residue substrate linker is fully randomized to generate a library of 
20 5 different protein sequence variants. These are displayed on die phage particles which are allowed to bind 
to the hGHbp. The protease of interest was added and if it cleaved the phage particle at the substrate linker 
it released that particle. The particles released by protease treatment can be propagated and subjected to 
another round of selection to further enrich for good protease substrates. Sequences that are retained can 

2 5 also be propagated to enrich for poor protease substrates. By sequencing the isolated phage genes at the end 
of either selection one can identify good and poor substrates for further analysis. 

We chose to focus on the subtilisin BPN* variant N62D/G166D as it was slightly better at 
discriminating the synthetic dibasic substrates from the others. We subjected the substrate phage library to 
nine rounds of selection with the subtilisin variant and isolated clones that were either increasingly sensitive 
30 or resistant to cleavage. Of twenty-one clones sequenced from the sensitive pool eighteen contained dibasic 
residues, eleven of which had the substrate linker sequence Asn-Leu-Met-Arg-Lys (SEQ ID NO: 35) (Table 
6). 
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TABLE 6 


Substrate phage sequences sensitive or resistant to N62D/G166D subtlUsin from a GG-xxxxx-GG library 
after 9 rounds of selection*. 


TSo Basic Sites fP) 


Protease Sensitive Pool 


Monobasic Sites (3) 

N L T A R(3) 
(SEQ ID NO: 34) 


Dibasic Sites (18) 

N L M R X(ll) 
(SEQ ID NO: 35 

T A S R R (4 ) 

(SEQ ID NO: 36 


L T R R S 

(SEQ ID NO: 

A L S R K 
(SEQ ID NO: 


37 


38) 


L M L R R 
(SEQ ID NO: 39) 


Protease Resistant Pool 


Monobasic Sites (2) 


Q K P N F 
(SEQ ID NO: 

R P G A M 
(SEQ ID NO: 


41) 


44) 


Dibasic Sites (1) 

R X P T H 
(SEQ ID NO: 


42) 


No Basic Sites ffl 

A S T H F 
(SEQ ID NO: 40) 

10 I Q Q Q Y 

(SEQ ID NO: 43) 

Q G E L P 
(SEQ ID NO: 47) 

A P D P T 
15 (SEQ ID NO: 46) 

QLLEH 
(SEQ ID NO: 47) 

V N N N H 
(SEQ ID NO: 48) 

20 A Q S N L 

(SEQ ID NO: 49) 


1 Numbers in parentheses indicate the number of times a particular DN A sequence was isolated. 

Three (3) of the sensitive sequences were monobasic, Asn-Leu-Thr-Ala-Arg (SEQ ID NO: 34). It is 
known that subtil isin has a preference for hydrophobic residues at the P4 position, if these and the other 
2 5 selected substrates were indeed cleaved after the last bask residue they all would have a Leu, Met or Ala at 
the P4 position. Almost no basic residues were isolated in the protease resistant pool and those that were had 
a Pro following the mono- or dibasic residue. It is known that subtilisin does not cleave substrates containing 
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Pro at the Pr position (Carter, P., Nilsson, B., Burnier. J., Burdick, D. and Wells, J. A. (1989) Proteins: 
Struct.. Funct, Genet. 6:240-248). Thus, di-basic substrates where highly selected and these had the 
additional feature of Leu, Met or Ala at the P4 position. 

Example 5 

5 Cleavage of Substrate Linkers 

We wished to analyze how efficiently the most frequently selected sequences were cleaved in the 
context of a fusion protein. For this we applied an alkaline phosphatase- fusion protein assay (Matthews. D. 
J., Goodman, L. J M Gorman, C. M., and Wells, J. A. {\99A)Protein Science 3:1 197-1205 and Matthews, D. 
J. and Wells, J. A. (1993)Scieiice 260:1 1 13-11 17). The hGH substrate linker domains were excised from the 

10 phage vector by PCR and fused in front of the gene for £. coli AP. The fusion protein was expressed and 
purified on an hGH receptor affinity column. The fusion protein was bound to the hGH receptor on a plate 
and treated with the subtilisin variant. The rate of cleavage of the fusion protein from the plate was monitored 
by collecting soluble fractions as a function of time and assaying for AP activity (Figure 5). The most 
frequently isolated substrate sequence, Asn-Leu-Met-Arg-Lys (SEQ ID NO: 35) was cleaved about ten times 

1 5 raster than the next most frequently isolated clones (Thr-Ala-Ser-Arg-Arg (SEQ ID NO: 36) and Asn-Leu- 
Thr-Ala-Arg (SEQ ID NO: 34). The cleaved AP products were also recovered and subjected to N-terminal 
sequencing to determine the sites of cleavage (Figure 5), cleavage site denoted by I). In all three fusion 
proteins, this she was immediately following the dibasic or monobasic site according to the mutant subtilisin 
design. We also tested the dibasic sequence isolated from the resistant pool, namely Arg-Lys-Pro-Thr-His 

20 (SEQ ID NO: 42). We observed no detectable cleavage above background for this substrate during the 
assay. 


The present invention has of necessity been discussed herein by reference to certain specific methods 
and materials. It is to be understood mat the discussion of these specific methods and materials in no way 
25 constitutes any limitation on the scope of the present invention, which extends to any and all alternative 
materials and methods suitable for accomplishing the ends of the present invention. 


All references cited herein are expressly incorporated by reference. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: Genentech, Inc. 

(ii) TITLE OF INVENTION: SUBTILISIN VARIANTS CAPABLE OF CLEAVING 
5 SUBSTRATES CONTAINING BASIC RESIDUES 

<iii) NUMBER OF SEQUENCES: 89 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Genentech, Inc. 

<B) STREET: 460 Point San Bruno Blvd 

10 (C) CITY: South San Francisco 

( D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP: 94080 

(v) COMPUTER READABLE FORM: 
15 (A) MEDIUM. TYPE: 3.5 inch, 1.44 Mb floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: WinPatin (Genentech) 

(vi) CURRENT APPLICATION DATA: 
20 (A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/398028 
25 (B) FILING DATE: 03-MAR-1995 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Kubinec, Jeffrey S. 

(B) REGISTRATION NUMBER: 36,575 

(C) REFERENCE/ DOCKET NUMBER: P0936P1PCT 

30 (ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 415/225-8228 

(B) TELEFAX: 415/952-9881 

(C) TELEX: 910/371-7168 

(2) INFORMATION FOR SEQ ID NO:l: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8119 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

40 (Xi> SEQUENCE DESCRIPTION: SEQ ID NO:l: 


GAATTCNGGT CTACTAAAAT ATTATTCCAT ACTATACAAT TAATACACAG 50 

AATAATCTGT CTATTGGTTA TTCTGCAAAT GAAAAAAAGG AGAGGATAAA 100 

GA GTG AGA GGC AAA AAA GTA TGG ATC AGT TTG CTG TTT 138 
Val Arg Gly Lys Lys Val Trp lie Ser Leu Leu Phe 
45 -107 -105 -100 

GCT TTA GCG TTA ATC TTT ACG ATG GCG TTC GGC AGC ACA 177 
Ala Leu Ala Leu He Phe Thr Met Ala Phe Gly Ser Thr 
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-95 -90 -85 

TCC TCT GCC CAG GCG GCA GGG AAA TCA AAC GGG GAA AAG 216 
Ser Ser Ala Gin Ala Ala Gly Lys Ser Asn Gly Giu Lys 
-80 -75 -70 

5 AAA TAT ATT GTC GGG TTT AAA CAG ACA ATG AGC ACG ATG 255 
Lys Tyr He Val Gly Phe Lys Gin Thr Met Ser Thr Met 
-65 -60 

AGC GCC GCT AAG AAG AAA GAT GTC ATT TCT GAA AAA GGC 294 
Ser Ala Ala Lys Lys Lys Asp Val He Ser Glu Lys Gly 
10 -55 -50 -45 

GGG AAA GTG CAA AAG CAA TTC AAA TAT GTA GAC GCA GCT 333 
Gly Lys Val Gin Lys Gin Phe Lys Tyr Val Asp Ala Ala 
-40 -35 

TCA GCT ACA TTA AAC GAA AAA GCT GTA AAA GAA TTG AAA 372 
15 Ser Ala Thr Leu Asn Glu Lys Ala Val Lys Glu Leu Lys 
-30 -25 -20 

AAA GAC CCG AGC GTC GCT TAC GTT GAA GAA GAT CAC GTA 411 
Lys Asp Pro Ser Val Ala Tyr Val Glu Glu Asp His Val 
-15 -10 -5 

20 GCA CAT GCG TAC GCG CAG TCC GTG CCT TAC GGC GTA TCA 450 
Ala His Ala Tyr Ala Gin Ser Val Pro Tyr Gly Val Ser 

1 5 

CAA ATT AAA GCC CCT GCT CTG CAC TCT CAA GGC TAC ACT 489 
Gin He Lys Ala Pro Ala Leu His Ser Gin Gly Tyr Thr 
25 10 15 20 

GGA TCA AAT GTT AAA GTA GCG GTT ATC GAC AGC GGT ATC 528 
Gly Ser Asn Val Lys Val Ala Val He Asp Ser Gly He 
25 30 35 

GAT TCT TCT CAT CCT GAT TTA AAG GTA GCA GGC GGA GCC 567 
30 Asp Ser Ser His Pro Asp Leu Lys Val Ala Gly Gly Ala 

40 45 

AGC ATG GTT CCT TCT GAA ACA AAT CCT TTC CAA GAC AAC 606 
Ser Met Val Pro Ser Glu Thr Asn Pro Phe Gin Asp Asn 
50 55 60 

35 GAC TCT CAC GGA ACT CAC GTT GCC GGC ACA GTT GCG GCT 645 
Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala Ala 
65 70 

CTT AAT AAC TCA ATC GGT GTA TTA GGC GTT GCG CCA AGC 684 
Leu Asn Asn Ser He Gly Val Leu Gly Val Ala Pro Ser 
40 75 80 85 

GCA TCA CTT TAC GCT GTA AAA GTT CTC GGT GCT GAC GGT 723 
Ala Ser Leu Tyr Ala Val Lys Val Leu Gly Ala Asp Gly 
90 95 100 

TCC GGC CAA TAC AGC TGG ATC ATT AAC GGA ATC GAG TGG 7 62 
45 Ser Gly Gin Tyr Ser Trp He He Asn Gly He Glu Trp 

105 HO 

GCG ATC GCA AAC AAT ATG GAC GTT ATT AAC ATG AGC CTC 801 
Ala He Ala Asn Asn Met Asp Val lie Asn Met Ser Leu 
115 120 125 


-30- 


WO 96/27671 


PCT/US96/02861 


GGC GGA CCT TCT GGT TCT GCT GCT TTA AAA GCG GCA GTT 840 

Gly Gly Pro Ser Gly Ser Ala Ala Leu Lys Ala Ala Val 
130 135 

GAT AAA GCC GTT GCA TCC GGC GTC GTA GTC GTT GCG GCA 879 

5 Asp Lys Ala Val Ala Ser Gly Val Val Val Val Ala Ala 

140 145 150 

GCC GGT AAC GAA GGC ACT TCC GGC AGC TCG TCG ACA GTG 918 

Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser Thr Val 
155 . 160 165 

10 GAC TAC CCT GGC AAA TAC CCT TCT GTC ATT GCA GTA GGC 957 

Asp Tyr Pro Gly Lys Tyr Pro Ser Val lie Ala Val Gly 
170 175 

GCT GTT GAC AGC AGC AAC CAA AGA GCA TCT TTC TCA AGC 996 

Ala Val Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser 

15 180 185 190 

GTA GGA CCT GAG CTT GAT GTC ATG GCA CCT GGC GTA TCT 1035 

Val Gly Pro Glu Leu Asp Val Met Ala Pro Gly Val Ser 
195 200 


ATC CAA AGC ACG CTT CCT GGA AAC AAA TAC GGG GCG TAC 1074 
20 lie Gin Ser Thr Leu Pro Gly Asn Lys Tyr Gly Ala Tyr 
205 210 215 


AAC GGT ACC TCA ATG GCA TCT CCG CAC GTT GCC GGA GCG 1113 
Asn Gly Thr Ser Met Ala Ser Pro His Val Ala Gly Ala 
220 225 230 

25 GCT GCT TTG ATT CTT TCT AAG CAC CCG AAC TGG ACA AAC 1152 
Ala Ala Leu lie Leu Ser Lys His Pro Asn Trp Thr Asn 
235 240 


ACT CAA GTC CGC AGC AGT TTA GAA AAC ACC ACT ACA AAA 1191 

Thr Gin Val Arg Ser Ser Leu Glu Asn Thr Thr Thr Lys 

30 245 250 255 

CTT GGT GAT TCT TTC TAC TAT GGA AAA GGG CTG ATC AAC 1230 

Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu lie Asn 
260 265 

GTA CAG GCG GCA GCT CAG TA AAACATAAAA AACCGGCCTT 1270 
35 Val Gin Ala Ala Ala Gin 
270 275 



GGCCCCGCCG 

GTTTTTTATT 

ATTTTTCTTC 

CTCCGCATGT 

TCAATCCGCT 

1320 


CCATAATCGA 

CGGATGGCTC 

CCTCTGAAAA 

TTTTAACGAG 

AAACGGCGGG 

1370 


TTGACCCGGC 

TCAGTCCCGT 

AACGGCCAAG 

TCCTGAAACG 

TCTCAATCGC 

1420 

40 

CGCTTCCCGG 

TTTCCGGTCA 

GCTCAATGCC 

GTAACGGTCG 

GCGGCGTTTT 

1470 


CCTGATACCG 

GGAGACGGCA 

TTCGTAATCG 

GATCCGGAAA 

TTGTAAACGT 

1520 


TAATATTTTG 

TTAAAATTCG 

CGTTAAATTT 

TTGTTAAATC 

AGCTCATTTT 

1570 


TTAACCAATA 

GGCCGAAATC 

GGCAAAATCC 

CTTATAAATC 

AAAAGAATAG 

1620 


ACCGAGATAG 

GGTTGAGTGT 

TGTTCCAGTT 

TGGAACAAGA 

GTCCACTATT 

1670 

45 

AAAGAACGTG 

GACTCCAACG 

TCAAAGGGCG 

AAAAACCGTC 

TATCAGGGCT 

1720 
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ATGGCCCACT ACGTGAACCA TCACCCTAAT CAAGTTTTTT GGGGTCGAGG 1770 
TGCCGTAAAG CACTAAATCG GAACCCTAAA GGGAGCCCCC GATTTAGAGC 1820 
TTGACGGGGA AAGCCGGCGA ACGTGGCGAG AAAGGAAGGG AAGAAAGCGA 1870 
AAGGAGCGGG CGCTAGGGCG CTGGCAAGTG TAGCGGTCAC GCTGCGCGTA 1920 
ACCACCACAC CCGCCGCGCT TAATGCGCCG CTACAGGGCG CGTCCGGATC 1970 
NGATCCGACG CGAGGCTGGA TGGCCTTCCC CATTATGATT CTTCTCGCTT 2020 
CCGGCGGCAT CGGGATGCCC GCGTTGCAGG CCATGCTGTC CAGGCAGGTA 2070 
GATGACGACC ATCAGGGACA GCTTCAAGGA TCGCTCGCGG CTCTTACCAG 2120 
CCTAACTTCG ATCACTGGAC CGCTGATCGT CACGGCGATT TATGCCGCCT 2170 
CGGCGAGCAC ATGGAACGGG TTGGCATGGA TTGTAGGCGC CGCCCTATAC 2220 
CTTGTCTGCC TCCCCGCGTT GCGTCGCGGT GCATGGAGCC GGGCCACCTC 2270 
GACCTGAATG GAAGCCGGCG GCACCTCGCT AACGGATTCA CCACTCCAAG 2320 
AATTGGAGCC AATCAATTCT TGCGGAGAAC TGTGAATGCG CAAACCAACC 2370 
CTTGGCAGAA CATATCCATC GCGTCCGCCA TCTCCAGCAG CCGCACGCGG 2420 
CGCATCTCGG GCCGCGTTGC TGGCGTTTTT CCATAGGCTC CGCCCCCCTG 2470 
ACGAGCATCA CAAAAATCGA CGCTCAAGTC AGAGGTGGCG AAACCCGACA 2520 
GGACTATAAA GATACCAGGC GTTTCCCCCT GGAAGCTCCC TCGTGCGCTC 2570 
TCCTGTTCCG ACCCTGCCGC TTACCGGATA CCTGTCCGCC TTTCTCCCTT 2620 
CGGGAAGCGT GGCGCTTTCT CAATGCTCAC GCTGTAGGTA TCTCAGTTCG 2670 
I GTGTAGGTCG TTCGCTCCAA GCTGGGCTGT GTGCACGAAC CCCCCGTTCA 2720 
GCCCGACCGC TGCGCCTTAT CCGGTAACTA TCGTCTTGAG TCCAACCCGG 2770 
TAAGACACGA CTTATCGCCA CTGGCAGCAG CCACTGGTAA CAGGATTAGC 2820 
AGAGCGAGGT ATGTAGGCGG TGCTACAGAG TTCTTGAAGT GGTGGCCTAA 2870 
CTACGGCTAC ACTAGAAGGA CAGTATTTGG TATCTGCGCT CTGCTGAAGC 2 920 
5 CAGTTACCTT CGGAAAAAGA GTTGGTAGCT CTTGATCCGG CAAACAAACC 2970 
ACCGCTGGTA GCGGTGGTTT TTTTGTTTGC AAGCAGCAGA TTACGCGCAG 3020 
AAAAAAAGGA TCTCAAGAAG ATCCTTTGAT CTTTTCTACG GGGTCTGACG 3070 
CTCAGTGGAA CGAAAACTCA CGTTAAGGGA TTTTGGTCAT GAGATTATCA 3120 
AAAAGGATCT TCACCTAGAT CCTTTTAAAT TAAAAATGAA GTTTTAAATC 3170 
0 AATCTAAAGT ATATATGAGT AAACTTGGTC TGACAGTTAC CAATGCTTAA 3220 
TCAGTGAGGC ACCTATCTCA GCGATCTGTC TATTTCGTTC ATCCATAGTT 3270 
GCCTGACTCC CCGTCGTGTA GATAACTACG ATACGGGAGG GCTTACCATC 3320 
TGGCCCCAGT GCTGCAATGA TACCGCGAGA CCCACGCTCA CCGGCTCCAG 3370 
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ATTTATCAGC AATAAACCAG CCAGCCGGAA 
CCTGCAACTT TATCCGCCTC CATCCAGTCT 
TAGAGTAAGT AGTTCGCCAG TTAATAGTTT 
CTGCAGGCAT CGTGGTGTCA CGCTCGTCGT 
5 TCCGGTTCCC AACGATCAAG GCGAGTTACA 
AAAAGCGGTT AGCTCCTTCG GTCCTCCGAT 
CCGCAGTGTT ATCACTCATG GTTATGGCAG 
GTCATGCCAT CCGTAAGATG CTTTTCTGTG 
GTCATTCTGA GAATAGTGTA TGCGGCGACC 

10 CAACACGGGA TAATACCGCG CCACATAGCA 
ATTGGAAAAC GTTCTTCGGG GCGAAAACTC 
GAGATCCAGT TCGATGTAAC CCACTCGTGC 
CTTTTACTTT CACCAGCGTT TCTGGGTGAG 
GCCGCAAAAA AGGGAATAAG GGCGACACGG 

15 CTTCCTTTTT CAATATTATT GAAGCATTTA 
GCGGATACAT ATTTGAATGT ATTTAGAAAA 
CGCACATTTC CCCGAAAAGT GCCACCTGAC 
CATGACATTA ACCTATAAAA ATAGGCGTAT 
AAGAATTAAT TCCTTAAGGA ACGTACAGAC 

20 CGTTTTTAAG GGGTTTGTAG ACAAGGTAAA 
AAGAAAAACA CGATTTAGAA CCTAAAAAGA 
AACCGAGAGG TAAAAAAAGA ACGAAGTCGA 
AAATAAAAAA AGCACCTGAA AAGGTGTCTT 
GTTCTTTCTT ATCTTGATAC ATATAGAAAT 

25 TGCTGAAAGG TGCGTTGAAG TGTTGGTATG 
AAACCCTTAA AATTGGTTGC ACAGAAAAAC 
GTGACTAAAC AAATAACTAA ATAGATGGGG 
TCCTAATAGT AGCATTTATT CAGATGAAAA 
AGACAAAAAG TGGAAAAGTG AGACCATGGA 

30 GTTGATTACT TTGAACTTCT GCATATTCTT 
AGTAAAAGAT TGTGCTGAAA TATTAGAGTA 
GCGAAAGAAA GTTGTATCGA GTGTGGTTTT 
ATGTGCAACT GGAGGAGAGC AATGAAACAT 
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r** f+ f*% s** v\ r** 

GGGCCGAGCG 

CAGAAGTGG1 

i a o n 
J 4 t u 

A.I I AA1 I G I 1 


J4 lU 

GCGCAACGTT 

GTToLLAI lu 


TTGGTATGGC 

T» *P IV T T* f* B 

jj ' U 

TGATCCCCCA 

TGTTGTGCAA 

•aeon 
3bzU 

CGTTGTCAGA 

AGTAAGTTGG 

JD/U 

CACTGCATAA 

TTCTCTTACT 

3 / /0 

ACTGGTGAGT 

ACTCAACCAA 

3 / /0 

GAGTTGCTCT 

TGCCCGGCGT 

3oZ0 

GAACTTTAAA 

AGTGCTCATC 

38 70 

TCAAGGATCT 

TACCGCTGTT 

3920 

ACCCAACTGA 

TCTTCAGCAT 

3 y /0 

CAAAAACAGG 

AAGGCAAAAT 

a n on 
40z0 

AAATGTTGAA 

TACTCATACT 

4 0/0 

TCAGGGTTAT 

TGTCTCATGA 

4120 

ATAAACAAAT 

AGGGGTTCCG 

4 170 

GTCTAAGAAA 

CCATTATTAT 

4220 

CACGAGGCCC 

TTTCGTCTTC 

4270 

GGCTTAAAAG 

CCTTTAAAAA 

4 ^ A A 

4320 

GGATAAAACA 

GCACAATTCC 

4370 

ACGAATTTGA 

ACTAACTCAT 

4420 

GATCAGGGAA 

TGAGTTTATA 

4470 

TTTTTGATGG 

TTTTGAACTT 

4520 

AACGTCATTT 

TTATTTTAGT 

4570 

TATGTGTTTT 

AAAGTATTGA 

4620 

CCCATCTGTT 

AAAGTTATAA 

4670 

GTTTCTTTTA 

ATATTATGTG 

4720 

ATCAAGGGTT 

TTAGTGGACA 

4770 

GAGAAAAGAA 

AATCGCTAAT 

4820 

GAATTTAAAA 

AGGCTGAAAG 

4870 

TAAACAAAAT 

CGTGAAACAG 

4920 

GTAAATCCAG 

GCTTTGTCCA 

4970 

GGCATTCAGT 

CACAAAAGGT 

5020 
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TGTTGCTGAA GTTATTAAAC AAAAGCCAAC AGTTCGTTGG TTGTTTCTCA 5070 
CATTAACAGT TAAAAATGTT TATGATGGCG AAGAATTAAA TAAGAGTTTG 5120 
TCAGATATGG CTCAAGGATT TCGCCGAATG ATGCAATATA AAAAAATTAA 5170 
TAAAAATCTT GTTGGTTTTA TGCGTGCAAC GGAAGTGACA ATAAATAATA 5220 
AAGATAATTC TTATAATCAG CACATGCATG TATTGGTATG TGTGGAACCA 5270 
ACTTATTTTA AGAATACAGA AAACTACGTG AATCAAAAAC AATGGATTCA 5320 
ATTTTGGAAA AAGGCAATGA AATTAGACTA TGATCCAAAT GTAAAAGTTC 5370 
AAATGATTCG ACCGAAAAAT AAATATAAAT CGGATATACA ATCGGCAATT 5420 
GACGAAACTG CAAAATATCC TGTAAAGGAT ACGGATTTTA TGACCGATGA 5470 
TGAAGAAAAG AATTTGAAAC GTTTGTCTGA TTTGGAGGAA GGTTTACACC 5520 
GTAAAAGGTT AATCTCCTAT GGTGG TTTGT TAAAAGAAAT ACATAAAAAA 5570 
TTAAACCTTG ATGACACAGA AGAAGGCGAT TTGATTCATA CAGATGATGA 5620 
CGAAAAAGCC GATGAAGATG GATTTTCTAT TATTGCAATG TGGAATTGGG 5670 
AACGGAAAAA TTATTTTATT AAAGAGTAGT TCAACAAACG GGCCAGTTTG 5720 
TTGAAGATTA GATGCTATAA TTGTTATTAA AAGGATTGAA GGATGCTTAG 5770 
GAAGACGAGT TATTAATAGC TGAATAAGAA CGGTGCTCTC CAAATATTCT 5820 
T ATT TAG AAA AGCAAATCTA AAATTATCTG AAAAGGGAAT GAGAATAGTG 5870 
AATGGACCAA TAATAATGAC TAGAGAAGAA AGAATGAAGA TTGTTCATGA 5920 
AATTAAGGAA CGAATATTGG ATAAATATGG GGATGATGTT AAGGCTATTG 5970 
GTGTTTATGG CTCTCTTGGT CGTCAGACTG ATGGGCCCTA TTCGGATATT 6020 
GAGATGATGT GTGTCATGTC AACAGAGGAA GCAGAGTTCA GCCATGAATG 6070 
GACAACCGGT GAGTGGAAGG TGGAAGTGAA TTTTGATAGC GAAGAGATTC 6120 
TACTAGATTA TGCATCTCAG GTGGAATCAG ATTGGCCGCT TACACATGGT 6170 
CAATTTTTCT CTATTTTGCC GATTTATGAT TCAGGTGGAT ACTTAGAGAA 6220 
AGTGTATCAA ACTGCTAAAT CGGTAGAAGC CCAAACGTTC CACGATGCGA 6270 
TTTGTGCCCT TATCGTAGAA GAGCTGTTTG AATATGCAGG CAAATGGCGT 6320 
AATATTCGTG TGCAAGGACC GACAACATTT CTACCATCCT TGACTGTACA 6370 
GGTAGCAATG GCAGGTGCCA TGTTGATTGG TCTGCATCAT CGCATCTGTT 6420 
ATACGACGAG CGCTTCGGTC TTAACTGAAG CAGTTAAGCA ATCAGATCTT 64 7C 
> CCTTCAGGTT ATGACCATCT GTGCCAGTTC GTAATGTCTG GTCAACTTTC 6520 
CGACTCTGAG AAACTTCTGG AATCGCTAGA GAATTTCTGG AATGGGATTC 6570 
AGGAGTGGAC AGAACGACAC GGATATATAG TGGATGTGTC AAAACGCATA 6620 
CCATTTTGAA CGATGACCTC TAATAATTGT TAATCATGTT GGTTACGTAT 6670 
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TTATTAACTT CTCCTAGTAT TAQTAATTAT CATGGCTGTC ATGGCGCATT 6720 
AACGGAATAA AGGGTGTGCT TAAATCGGGC CATTTTGCGT AATAAGAAAA 6770 
AGGATTAATT ATGAGCGAAT TGAATTAATA ATAAGGTAAT AGATTTACAT 6820 
TAGAAAATGA AAGGGGATTT TATGCGTGAG AATGTTACAG TCTATCCCGG 6870 
5 CAATAGTTAC CCTTATTATC AAGATAAGAA AGAAAAGGAT TTTTCGCTAC 6920 
GCTCAAATCC TTTAAAAAAA CACAAAAGAC CACATTTTTT AATGTGGTCT 6970 
TTATTCTTCA ACTAAAGCAC CCATTAGTTC AACAAACGAA AATTGGATAA 7020 
AGTGGGATAT TTTTAAAATA TATATTTATG TTACAGTAAT ATTGACTTTT 7070 
AAAAAAGGAT TGATTCTAAT GAAGAAAGCA GACAAGTAAG CCTCCTAAAT 7120 
10 TCACTTTAGA TAAAAATTTA GGAGGCATAT CAAATGAACT TTAATAAAAT 7170 
TGATTTAGAC AATTGGAAGA GAAAAGAGAT ATTTAATCAT TATTTGAACC 7220 
AACAAACGAC TTTTAGTATA ACCACAGAAA TTGATATTAG TGTTTTATAC 7270 
CGAAACATAA AACAAGAAGG ATATAAATTT TACCCTGCAT TTATTTTCTT 7320 
AGTGACAAGG GTGATAAACT CAAATACAGC TTTTAGAACT GGTTACAATA 7370 
15 GCGACGGAGA GTTAGGTTAT TGGGATAAGT TAGAGCCACT TTATACAATT 7420 
TTTGATGGTG TATCTAAAAC ATTCTCTGGT ATTTGGACTC CTGTAAAGAA 7470 
TGACTTCAAA GAGTTTTATG ATTTATACCT TTCTGATGTA GAGAAATATA 7520 
ATGGTTCGGG GAAATTGTTT CCCAAAACAC CTATACCTGA AAATGCTTTT 7570 
TCTCTTTCTA TTATTCCATG GACTTCATTT ACTGGGTTTA ACTTAAATAT 7620 
20 CAATAATAAT AGTAATTACC TTCTACCCAT TATTACAGCA GGAAAATTCA 7670 
TTAATAAAGG TAATTCAATA TATTTACCGC TATCTTTACA GGTACATCAT 7720 
TCTGTTTGTG ATGGTTATCA TGCAGGATTG TTTATGAACT CTATTCAGGA 7770 
ATTGTCAGAT AGGCCTAATG ACTGGCTTTT ATAATATGAG ATAATGCCGA 7820 
CTGTACTTTT TACAGTCGGT TTTCTAATGT CACTAACCTG CCCCGTTAGT 7870 
25 TGAAGAAGGT TTTTATATTA CAGCTCCAGA TCCATATCCT TCTTTTTCTG 7920 
AACCGACTTC TCCTTTTTCG CTTCTTTATT CCAATTGCTT TATTGACGTT 7970 
GAGCCTCGGA ACCCNTATAG TGTGTTATAC TTTACTTGGA AGTGGTTGCC 8020 
GGAAAGAGCG AAAATGCCTC ACATTTGTGC CACCTAAAAA GGAGCGATTT 8070. 
ACATATGAGT TATGCAGTTT GTAGAATGCA AAAAGTGAAA TCAGGATCN 8119 
30 (2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 382 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Val Arg Gly Lys Lys Val Trp lie Ser Leu Leu Phe Ala Leu Ala 
-10*7 -105 -100 -95 

Leu He Phe Thr Met Ala Phe Gly Ser Thr Ser Ser Ala Gin Ala 
5 -90 -85 -80 

Ala Gly Lys Ser Asn Gly Glu Lys Lys Tyr He Val Gly Phe Lys 
-75 -70 -65 

Gin Thr Met Ser Thr Met Ser Ala Ala Lys Lys Lys Asp Val lie 
-60 -55 -50 

10 Ser Glu Lys Gly Gly Lys Val Gin Lys Gin Phe Lys Tyr Val Asp 
-45 * -40 -35 

Ala Ala Ser Ala Thr Leu Asn Glu Lys Ala Val Lys Glu Leu Lys 
-30 -25 -20 

Lys Asp Pro Ser Val Ala Tyr Val Glu Glu Asp His Val Ala His 
15 -15 -10 -5 

Ala Tyr Ala Gin Ser Val Pro Tyr Gly Val Ser Gin He Lys Ala 
1 5 10 

Pro Ala Leu His Ser Gin Gly Tyr Thr Gly Ser Asn Val Lys Val 
15 20 25 

20 Ala Val He Asp Ser Gly He Asp Ser Ser His Pro Asp Leu Lys 
30 35 40 

Val Ala Gly Gly Ala Ser Met Val Pro Ser Glu Thr Asn Pro Phe 
45 50 55 

Gin Asp Asn Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala 
25 60 65 7 0 

Ala Leu Asn Asn Ser lie Gly Val Leu Gly Val Ala Pro Ser Ala 
75 80 85 

Ser Leu Tyr Ala Val Lys Val Leu Gly Ala Asp Gly Ser Gly Gin 
90 " 95 100 

30 Tyr Ser Trp He He Asn Gly He Glu Trp Ala He Ala Asn Asn 
105 HO 115 

Met Asp Val He Asn Met Ser Leu Gly Gly Pro Ser Gly Ser Ala 
120 125 130 

Ala Leu Lys Ala Ala Val Asp Lys Ala Val Ala Ser Gly Val Val 
35 135 140 145 

Val Val Ala Ala Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser 
150 155 160 

Thr Val Asp Tyr Pro Gly Lys Tyr Pro Ser Val He Ala Val Gly 
165 170 175 

40 Ala Val Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser Val Gly 
180 185 190 

Pro Glu Leu Asp Val Met Ala Pro Gly Val Ser He Gin Ser Thr 
195 200 205 
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Leu Pro Gly Asn Lys Tyr Gly Ala Tyr Asn Gly Thr Ser Met Ala 
210 * 215 " 220 

Ser Pro His Val Ala Gly Ala Ala Ala Leu lie Leu Ser Lys His 
225 230 235 

5 Pro Asn Trp Thr Asn Thr Gin Val Arg Ser Ser Leu Glu Asn Thr 
240 245 250 

Thr Thr Lys Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu lie 
255 260 265 

Asn Val Gin Ala Ala Ala Gin 
10 270 275 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
15 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 

Ser Leu Gly Gly Pro Ser Gly 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 4: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 
<B) TYPE: Amino Acid 
<D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

25 Ala Ala Ala Gly Asn Glu Gly 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 amino acids 
30 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Ser Thr Val Gly Tyr Pro 
1 5 6 

35 (2) INFORMATION FOR SEQ ID NO: 6: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Ser Trp Gly Pro Ala Asp Asp 
15 7 


(2) INFORMATION FOR SEQ ID NO: 7: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

5 {xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Phe Ala Ser Gly Asn Gly Gly 
1 5 7 

(2) INFORMATION FOR_ SEQ ID NO:8: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Cys Asn Tyr Asp Gly Tyr Thr 
15 1 5 7 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
20 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 

Ser Trp Gly Pro Glu Asp Asp 
15 7 

(2) INFORMATION FOR SEQ ID NO: 10: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 

30 Trp Ala Ser Gly Asn Gly Gly 

1 5 7 

(2) INFORMATION FOR SEQ ID NO: 11: 

U) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 7 amino acids 
35 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 

Cys Asn Cys Asp Gly Tyr Thr 
1 5 7 

40 (2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Trp Ala Ser Gly Asp Gly Gly 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 13: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

10 Cys Asn Cys Asp Gly Tyr Ala 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: € amino acids 
15 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Val He Asp Ser Gly He 
1 5 6 

20 (2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

25 (Xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: 

Asp Asn Asn Ser His 
1 5 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 6 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

He Val Asp Asp Gly Leu 
35 1 5 6 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
40 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 


Ser Asp Asp Tyr His 
1 5 
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(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH : 6 amino acids 

(B) TYPE: Amino Acid 
5 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

lie Leu Asp Asp Gly lie 
1 ^5 6 

(2) INFORMATION FOR SEQ ID NO: 19: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

15 Asn Asp Asn Arg His 
1 5 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 amino acids 
20 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

He Met Asp Asp Gly He 
1 5 6 

25 (2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

Trp Phe Asn Ser His 
1 5 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 27 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 


40 GCGGTTATCG ACGACGGTAT CGATTCT 27 

(2) INFORMATION FOR SEQ ID NO:23: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 27 base pairs 
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(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 


5 GCGGTTATCG ACAAAGGTAT CGATTCT 27 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: Nucleic Acid 
10 (C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 


GCGGTTATCG ACGAAGGTAT CGATTCT 27 
(2) INFORMATION FOR SEQ ID NO: 25: 

15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:25 


CCAAGACAAC GACTCTCACG GAA 23 

(2) INFORMATION FOR SEQ ID NO:2€: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 23 base pairs 
25 (B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 


CCAAGACAAC AGCTCTCACG GAA 23 
30 (2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 
35 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 


CCAAGACAAC AAATCTCACG GAA 23 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 42 base pairs 
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(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 


5 CACTTCCGGC AGCTCGTCGA CAGTGGACTA CCCTGGCAAA TA 4 2 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: Nucleic Acid 
10 (C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 


CACTTCCGGC AGCTCGTCGA CAGTGGAGTA CCCTGGCAAA TA 4 2 
(2) INFORMATION FOR SEQ ID NO: 30: 

15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: 


TTAACATGAG CCTCGGCCCA GCTAGCGGTT CTGCTGCTTT A 41 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 43 base pairs 
25 (B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 


TTAACATGAG CCTCGGCCCC GCGGATGATT CTGCTGCTTT AAA 43 
30 (2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 7 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 
35 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 


CGGCAGCTCA AGCAACGATG GCTATCCTGG CAAATACCCT TCTGTCA 47 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 4 4 base pairs 
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(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 


5 ACTTCCGGCA GCTCTTCGAA CTACGACGGG TACCCTGGCA AATA 44 
(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
10 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

Asn Leu Thr Ala Arg 
1 5 

(2) INFORMATION FOR SEQ ID NO: 35 : 

15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

20 Asn Leu Met Arg Lys 
1 5 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 5 amino acids 
25 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Thr Ala Ser Arg Arg 
1 5 

30 (2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

Leu Thr Arg Arg Ser 
1 5 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
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Ala Leu Ser Arg Lys 
1 5 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

Leu Met Leu Arg Lys 
10 1 5 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
15 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

Ala Ser Thr His Phe 
1 5 

(2) INFORMATION FOR SEQ ID N0:41: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

25 Gin Lys Pro Asn Phe 
1 5 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 5 amino acids 
30 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

Arg Lys Pro Thr His 
1 5 

35 (2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

4 0 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

He Gin Gin Gin Tyr 
1 5 

(2) INFORMATION FOR SEQ ID NO: 44: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 4 

Arg Pro Giy Ala Met 
1 5 

(2) INFORMATION FOR_ SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45 

Gin Gly Glu Leu Pro 
15 1 5 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
20 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46 

Ala Pro Asp Pro Thr 
1 5 

(2) INFORMATION FOR SEQ ID NO: 47: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47 

30 Gin Leu Leu Glu His 
1 5 

(2) INFORMATION FOR SEQ ID NO:48: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 5 amino acids 
35 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48 

Val Asn Asn Asn His 
1 5 

40 (2) INFORMATION FOR SEQ ID NO:49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 


45 


WO 96/27671 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

Ala Gin Ser Asn Leu 
1 5 

(2) INFORMATION FOR SEQ ID NO: 50: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGYj Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 

10 Thr Ala Ser Arg Arg 
1 5 

(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 amino acids 
15 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 

His His His His His His 
1 5 6 

20 (2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

Leu Met Arg Lys 
1 4 

(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

Leu Thr Ala Arg 
35 1 4 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
40 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 

Gly Pro Gly Gly 
1 4 
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(2) INFORMATION FOR SEO ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
5 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 

Gly Leu Met Arg Lys 

(2) INFORMATION FOR SEQ ID NO: 56: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
<D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 

15 Ala Ala Pro Phe 
1 4 

(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 13 amino acids 
20 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: 

Gly Pro Gly Gly Xaa Xaa Xaa Xaa Xaa Gly Gly Pro Gly 
1 5 10 13 

25 (2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 

Ala Ala Pro Lys 
1 4 

(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: 

Ala Ala Pro Arg 
40 1 4 

(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
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(D) TOPOLOGY: Linear 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 

Ala Ala Pro Met 
1 4 

5 (2) INFORMATION FOR SEQ ID NO: 61: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

Ala Ala Pro Gin 
1 4 

(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 

Ala Ala Lys Phe 
20 1 4 

(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
25 <D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 

Ala Ala Ala Phe 
1 4 

(2) INFORMATION FOR SEQ ID NO: 64: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 

35 Ala Ala Arg Phe 
1 4 

(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4 amino acids 
40 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 

Ala Ala Asp Phe 
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1 4 

(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4 amino acids 
5 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 

Ala Ala Lys Lys 
1 4 

10 (2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 

Ala Ala Lys Arg 
1 4 

(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 

Ala Ala Lys Phe 
25 1 4 

(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4 amino acids 
<B) TYPE: Amino Acid 
30 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 

Ala Ala Pro Xaa 
1 4 

(2) INFORMATION FOR SEQ ID NO:70: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:70: 

40 Ala Ala Xaa Phe 
1 4 

(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 

5 Ala Ala Xaa Xaa Xaa 

1 5 

(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 275 amino acids 
10 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: 

Ala Gin Ser Val Pro Tyr Gly Val Ser Gin He Lys Ala Pro Ala 
15 10 15 

15 Leu His Ser Gin Gly Tyr Thr Gly Ser Asn Val Lys Val Ala Val 

20 25 30 

He Asp Ser Gly He Asp Ser Ser His Pro Asp Leu Lys Val Ala 
35 40 45 

Gly Gly Ala Ser Met Val Pro Ser Glu Thr Asn Pro Phe Gin Asp 
20 50 55 60 

Asn Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala Ala Leu 
65 70 75 

Asn Asn Ser He Gly Val Leu Gly Val Ala Pro Ser Ala Ser Leu 
80 85 90 

25 Tyr Ala Val Lys Val Leu Gly Ala Asp Gly Ser Gly Gin Tyr Ser 

95 " 100 105 

Trp He He Asn Gly He Glu Trp Ala He Ala Asn Asn Met Asp 
110 115 120 

Val He Asn Met Ser Leu Gly Gly Pro Ser Gly Ser Ala Ala Leu 
30 125 130 135 

Lys Ala Ala Val Asp Lys Ala Val Ala Ser Gly Val Val Val Val 
140 145 150 

Ala Ala Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser Thr Val 
155 160 165 

35 Asp Tyr Pro Gly Lys Tyr Pro Ser Val He Ala Val Gly Ala Val 

170 175 180 

Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser Val Gly Pro Glu 
185 190 195 

Leu Asp Val Met Ala Pro Gly Val Ser He Gin Ser Thr Leu Pro 
40 200 205 210 

Gly Asn Lys Tyr Gly Ala Tyr Asn Gly Thr Ser Met Ala Ser Pro 
215 220 225 

His Val Ala Gly Ala Ala Ala Leu He Leu Ser Lys His Pro Asn 
230 235 240 


50- 


Trp Thr Asn Thr Gin Val Arg Ser Ser Leu Glu Asn Thr Thr Thr 
245 250 255 


Lys Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu lie Asn Val 
260 265 270 

Gin Ala Ala Ala Gin 
275 

(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 4 amino acids 
(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 


Arg Val Arg Arg 
1 4 

(2) INFORMATION FOR SEQ ID NO:74: 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1146 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 


GTG AGA GGC AAA AAA GTA TGG ATC AGT TTG CTG TTT 36 
Val Arg Gly Lys Lys Val Trp lie Ser Leu Leu Phe 
-107 -105 -100 

GCT TTA GCG TTA ATC TTT ACG ATG GCG TTC GGC AGC ACA 75 
Ala Leu Ala Leu lie Phe Thr Met Ala Phe Gly Ser Thr 
-95 -90 -85 


TCC TCT GCC CAG GCG GCA GGG AAA TCA AAC GGG GAA AAG 114 
Ser Ser Ala Gin Ala Ala Gly Lys Ser Asn Gly Glu Lys 
-80 -75 -70 

AAA TAT ATT GTC GGG TTT AAA CAG ACA ATG AGC ACG ATG 153 
Lys Tyr He Val Gly Phe Lys Gin Thr Met Ser Thr Met 
-65 -60 

AGC GCC GCT AAG AAG AAA GAT GTC ATT TCT GAA AAA GGC 192 
Ser Ala Ala Lys Lys Lys Asp Val He Ser Glu Lys Gly 
-55 -50 -45 


GGG AAA GTG CAA AAG CAA TTC AAA TAT GTA GAC GCA GCT 231 
Gly Lys Val Gin Lys Gin Phe Lys Tyr Val Asp Ala Ala 
-40 -35 


TCA GCT ACA TTA AAC GAA AAA GCT GTA AAA GAA TTG AAA 270 
Ser Ala Thr Leu Asn Glu Lys Ala Val Lys Glu Leu Lys 
-30 -25 -20 

AAA GAC CCG AGC GTC GCT TAC GTT GAA GAA GAT CAC GTA 309 
Lys Asp Pro Ser Val Ala Tyr Val Glu Glu Asp His Val 
-15 -10 -5 


AGA CAT AAG CGC GCG CAG TCC GTG CCT TAC GGC GTA TCA 34 8 
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Arg His Lys Arg Ala Gin Ser Val Pro Tyr Gly Val 
1 5 


Ser 


CAA ATT AAA GCC CCT GCT CTG CAC TCT CAA GGC TAC ACT 387 
Gin lie Lys Ala Pro Ala Leu His Ser Gin Gly Tyr Thr 
5 10 15 20 

GGA TCA AAT GTT AAA GTA GCG GTT ATC GAC AGC GGT ATC 426 
Gly Ser Asn Val Lys Val Ala Val He Asp Ser Gly He 
25 30 35 

GAT TCT TCT CAT CCT GAT TTA AAG GTA GCA GGC GGA GCC 4 65 
10 Asp Ser Ser His Pro Asp Leu Lys Val Ala Gly Gly Ala 

40 45 

AGC ATG GTT CCT TCT GAA ACA AAT CCT TTC CAA GAC AAC 504 
Ser Met Val Pro Ser Glu Thr Asn Pro Phe Gin Asp Asn 
50 55 60 

15 GAC TCT CAC GGA ACT CAC GTT GCC GGC ACA GTT GCG GCT 543 
Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala Ala 
65 70 


20 


CTT AAT AAC TCA ATC GGT GTA TTA GGC GTT GCG CCA AGC 582 
Leu Asn Asn Ser He Gly Val Leu Gly Val Ala Pro Ser 
75 80 85 


GCA TCA CTT TAC GCT GTA AAA GTT CTC GGT GCT GAC GGT 621 
Ala Ser Leu Tyr Ala Val Lys Val Leu Gly Ala Asp Gly 
90 95 100 

TCC GGC CAA GAT AGC TGG ATC ATT AAC GGA ATC GAG TGG 660 
25 Ser Gly Gin Asp Ser Trp He He Asn Gly He Glu Trp 

105 110 

GCG ATC GCA AAC AAT ATG GAC GTT ATT AAC ATG AGC CTC 699 
Ala He Ala Asn Asn Met Asp Val He Asn Met Ser Leu 
115 120 125 

30 GGC GGA CCT TCT GGT TCT GCT GCT TTA AAA GCG GCA GTT 738 
Gly Gly Pro Ser Gly Ser Ala Ala Leu Lys Ala Ala Val 
130 135 


35 


GAT AAA GCC GTT GCA TCC GGC GTC GTA GTC GTT GCG GCA 777 
Asp Lys Ala Val Ala Ser Gly Val Val Val Val Ala Ala 
140 145 150 


GCC GGT AAC GAA GGC ACT TCC GGC AGC TCG TCG ACA GTG 816 
Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser Thr Val 
155 160 165 

GAC TAC CCT GGC AAA TAC CCT TCT GTC ATT GCA GTA GGC 855 
40 Asp Tyr Pro Gly Lys Tyr Pro Ser Val He Ala Val Gly 

170 175 

GCT GTT GAC AGC AGC AAC CAA AGA GCA TCT TTC TCA AGC 894 
Ala Val Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser 
180 185 190 

45 GTA GGA CCT GAG CTT GAT GTC ATG GCA CCT GGC GTA TCT 933 
Val Gly Pro Glu Leu Asp Val Met Ala Pro Gly Val Ser 
195 200 

ATC CAA AGC ACG CTT CCT GGA AAC AAA TAC GGG GCG TAC 972 
He Gin Ser Thr Leu Pro Gly Asn Lys Tyr Gly Ala Tyr 
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205 210 215 
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270 275 

(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH : 382 amino acids 
20 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 

Val Arg Gly Lys Lys Val Trp He Ser Leu Leu Phe Ala Leu Ala 
-107 -105 -100 -95 

25 Leu He Phe Thr Met Ala Phe Gly Ser Thr Ser Ser Ala Gin Ala 
-90 -85 -80 

Ala Gly Lys Ser Asn Gly Glu Lys Lys Tyr He Val Gly Phe Lys 
-75 -70 -65 

Gin Thr Met Ser Thr Met Ser Ala Ala Lys Lys Lys Asp Val He 
30 -60 -55 -50 

Ser Glu Lys Gly Gly Lys Val Gin Lys Gin Phe Lys Tyr Val Asp 
-45 -40 -35 

Ala Ala Ser Ala Thr Leu Asn Glu Lys Ala Val Lys Glu Leu Lys 
-30 -25 -20 

35 Lys Asp Pro Ser Val Ala Tyr Val Glu Glu Asp His Val Arg His 
-15 -10 -5 

Lys Arg Ala Gin Ser Val Pro Tyr Gly Val Ser Gin lie Lys Ala 
1 5 10 

Pro Ala Leu His Ser Gin Gly Tyr Thr Gly Ser Asn Val Lys Val 
40 15 20 25 

Ala Val He Asp Ser Gly He Asp Ser Ser His Pro Asp Leu Lys 
30 35 40 

Val Ala Gly Gly Ala Ser Met Val Pro Ser Glu Thr Asn Pro Phe 
45 50 55 

45 Gin Asp Asn Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala 
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60 65 70 

Ala Leu Asn Asn Ser lie Gly Val Leu Gly Val Ala Pro Ser Ala 
75 80 85 

Ser Leu Tyr Ala Val Lys Val Leu Gly Ala Asp Gly Ser Gly Gin 
5 90 95 100 

Asp Ser Trp He He Asn Gly He Glu Trp Ala He Ala Asn Asn 
105 110 115 

Met Asp Val He Asn Met Ser Leu Gly Gly Pro Ser Gly Ser Ala 
120 125 130 

10 Ala Leu Lys Ala Ala Val Asp Lys Ala Val Ala Ser Gly Val Val 
135 140 145 

Val Val Ala Ala Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser 
150 155 160 

Thr Val Asp Tyr Pro Gly Lys Tyr Pro Ser Val He Ala Val Gly 
15 165 170 175 

Ala Val Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser Val Gly 
180 185 190 

Pro Glu Leu Asp Val Met Ala Pro Gly Val Ser lie Gin Ser Thr 
195 200 205 

20 Leu Pro Gly Asn Lys Tyr Gly Ala Tyr Asn Gly Thr Ser Met Ala 
210 215 220 

Ser Pro His Val Ala Gly Ala Ala Ala Leu He Leu Ser Lys His 
225 230 235 

Pro Asn Trp Thr Asn Thr Gin Val Arg Ser Ser Leu Glu Asn Thr 
25 240 245 250 

Thr Thr Lys Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu He 
255 260 265 

Asn Val Gin Ala Ala Ala Gin 
270 275 

30 (2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:76: 

Asn Arg Met Arg Lys 
1 " 5 

(2) INFORMATION FOR SEQ ID NO: 77: 

(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 11 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 
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Gly Ser Gly Gin Tyr Ser Trp lie lie Asn Gly 
1 5 10 11 

(2) INFORMATION FOR SEQ ID NO:78: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 11 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 

Gly Asp lie Thr Thr Glu Asp Glu Ala Ala Ser 
10 1 5 10 11 

(2) INFORMATION FOR SEQ ID NO:79: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 11 amino acids 
<B) TYPE: Amino Acid 
15 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 

Gly Glu Val Thr Asp Ala Val Glu Ala Arg Ser 
1 5 10 11 

(2) INFORMATION FOR SEQ ID NO:80: 

20 (i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 80: 

25 Pro Phe Met Thr Asp He He Glu Ala Ser Ser 
1 5 10 11 

(2) INFORMATION FOR SEQ ID NO:81: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 11 amino acids 
30 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: 

Gly He Val Thr Asp Ala He Glu Ala Ser Ser 
1 5 10 11 

35 (2) INFORMATION FOR SEQ ID NO: 82: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 
40 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 


GGTTCCGGCC AAGATAGCTG GATCATT 27 
(2) INFORMATION FOR SEQ ID NO:83: 


55 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 
5 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83: 


CCAATACAGC TGGGAAATTA ACGGAATCG 29 

(2} INFORMATION FOR SEQ ID NO:84: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 31 base pairs 

<B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84: 


15 GGTTCCGGCC AAGATAGCTG GGAAATTAAC G 31 
(2) INFORMATION FOR SEQ ID NO: 85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: Nucleic Acid 
20 (C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85: 


AAGAAGATCA CGTAAGACAT AAGCGCGCGC 30 
(2) INFORMATION FOR SEQ ID NO: 86: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
CD) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 

30 Arg Ala Lys Arg 
1 4 

(2) INFORMATION FOR SEQ ID NO: 87: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4 amino acids 
35 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 

Lys Ala Lys Arg 
1 4 

40 (2) INFORMATION FOR SEQ ID NO: 88: 

<i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 8 amino acids 
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(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88: 

Gly Pro Gly Gly Leu Met Arg Lys 
5 15 8 

(2) INFORMATION FOR SEQ ID NO: 89: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: *& amino acids 

(B) TYPE: Amino Acid 
10 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:89: 

Gly Pro Gly Gly Lys Ala Lys Arg 
1 5 8 
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What is claimed is: 

1. A subtilisin variant derived from a precursor subtilisin-type serine protease said variant 
capable of cleaving a polypeptide substrate comprising the sequence: 

5 OH 

I I 

P4-P3-P2-P1-C-N-P1' 

wherein; 

P4 is a basic amino acid; 
10 P3 is any amino acid selected from the naturally occurring amino acids; 

P2 is a basic amino acid; 
PI is a basic amino acid; and 
FY is not Pro. 

2. The subtilisin variant of claim I containing an acidic amino acid at a residue equivalent to 
15 Asn 62, Tyr 104 and Gly 166 of the subtilisin naturally produced by Bacillus amyhliquefaciens. 

3. The subtilisin-type serine protease variant of claim 2 wherein the acidic amino acid is Asp 
or Glu. 

4. The subtilisin-type serine protease variant of claim 3 wherein the acidic amino acid is Asp. 

5. The subtilisin-type serine protease variant of claim 2 wherein the precursor subtilisin-type 

2 0 serine protease in the subtilisin naturally produced by Bacillus amyhliquefaciens. 

6. The subtilisin variant of claim 5 having the amino acid sequence of the mature polypeptide 
of Figure 8 (SEQ ID NO: 75). 

7. A subtilisin variant having substrate specificity for peptide substrates containing dibasic 
amino acid sequences. 

25 8. The subtilisin variant of claim 7 having a different amino acid residue at residue position +62 

than subtilisin naturally produced by Bacillus amyhliquefaciens. 

9. The subtilisin variant of Claim 8 having an Asp or Glu at residue position +62. 

1 0. The subtilisin variant of Claim 9 having an Asp at residue position +62. 

1 1 . The subtilisin variant of Claim 10 further having an Asp or Glu at residue position +166. 
30 12. The subtilisin variant of Claim 1 1 having an Asp at residue position +166. 

13. The subtilisin variant of Claim 12 having the amino acid sequence of the mature polypeptide 
provided in Fig. 6. 

14. An isolated nucleic acid molecule encoding the subtilisin variant of Claim 1 . 

15. Hie nucleic acid molecule of Claim 1 4 further comprising a promoter operably linked to the 

3 5 nucleic acid molecule. 

16. An expression vector comprising the nucleic acid molecule of Claim 1 5 operably linked to 
control sequences recognized by a host cell transformed with the vector. 

17. A host cell transformed with the vector of Claim 16. 

1 8. An isolated nucleic acid molecule encoding the subtilisin variant of Claim 7. 
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19. The nucleic acid molecule of Claim 1 8 further comprising a promoter operabty linked to the 
nucleic acid molecule. 

20. An expression vector comprising the nucleic acid molecule of Claim 1 9 operably linked to 
control sequences recognized by a host cell transformed with the vector. 

5 21. A host ceil transformed with the vector of Claim 20. 

22. A process of using the nucleic acid molecule encoding the subtilisin variant to effect 
production of the subtilisin variant comprising cuhuring the host cell of Claim 2 1 under conditions suitable 
for expression of the subtilisin variant. 

23 The process of Claim 22 further comprising recovering the subtilisin variant from the host 
10 cell culture medium. 

24. A method of using the subtilisin variant of Claim 1 comprising contacting a fusion protein 
containing a dibasic sequence with die subtilisin variant. 

25. A process for cleaving a polypeptide, said polypeptide comprising an amino acid sequence 
represented by the formula: 

15 P4-P3.P2.pl. PI' 

wherein, 

P4 is a basic amino acid; 

P3 is an amino acid selected from the naturally occurring amino acids; 
P2 is a basic amino acid; 
20 PI is a basic amino acid; and 

PI* is not Pro; 
comprising the step of: 

subjecting said polypeptide to the subtilisn variant of claim 1 in a reaction mixture under conditions 
such that the subtilisn variant cleaves the polypeptide. 
25 26. A process of using the nucleic acid molecule encoding the subtilisin variant to effect 

production of the subtilisin variant comprising cukuring the host cell of Claim 17 under conditions suitable 
for expression of the subtilisin variant 

27. The process of Claim 26 further comprising recovering the subtilisin variant from the host 
cell culture medium. 

30 2S. A method of using the subtilisin variant of Claim 7 comprising contacting a fusion protein 

containing a dibasic sequence with the subtilisin variant. 

29. A process for cleaving a polypeptide, said polypeptide comprising an amino acid sequence 
represented by the formula: 
P4-P3-P2-PI-P1' 
35 wherein, 

P4 is a large hydrophobic amino acid; 

P3 is an amino acid selected from the naturally occurring amino acids; 
P2 is a basic amino acid; 
PI is a basic amino acid; and 
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PI* is not Pro; 
comprising the step of: 

subjecting said polypeptide to the subtilisn variant of claim 7 in a reaction mixture under conditions 
such that the subtilisn variant cleaves the polypeptide. 
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